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Editorial 


Each January since becoming Editor, I have used this editorial space to provide a summary of the 
journal statistics of American Documentation for the prior year. These data are also given in the Annual 
Report of the Editor to the American Documentation Institute Annual Meeting; but since almost half 
of the subseribers are not members of the Institute, the information has been repeated. In 1964 and again 
in 1965 the cover theme of the January issue has been used to provide a graphie display of the publication - 
pattern over the years. | 

Among the statistics given, a differentiation is made between refereed and non-refereed items. The 
latter include letters, book reviews, editorials, contents pages, indexes, and the like. Regular journal 
articles and Brief Communications are refereed. The data for 1966 are tabulated’ below and should be 
considered in the knowledge that the publication budget was reduced by $4500 transferred to Documentation 


Abstracts. 


Publication Statistics 


1965 1966 

Papers 38 29 

Pages 381 227 
Journal Articles 27 27 

Brief Communications 11 2 


. This brings me to the real subject of the editorial. The unsung referee. Each year I have been 
increasingly grateful to the anonymous group of gifted people to whom I send manuscripts of a great 
range of clarity, pertinence and value. I am continually amazed at the care and work devoted by these 
referees to the brainchildren of others. Calculations, tables, figures, and formulas are checked; bibliographies 
are updated and corrected; turgid prose is clarified, etc. Even of greater importance a general perspective 
is provided which enables the Editor to schedule material for publication in a useful and realistic manner. 

My files contain dozens of notes and letters from authors expressing gratitude and appreciation for this 
aid. On occasion acknowledgment is made in the manuscript; often it is not. Remembering that absolutely 
no compensation is given for these labors, I would like to express my personal appreciation and that of the 
Institute to the AD referees—professionals in the truest sense of the word. Thank you. 


AnrHun W. ELIAS 
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The development of a low-cost library control system 
for visual aids based on punch cards is described. 
Miniature pictures of the visual aids are printed on the 
cards with a Xerox 914 Offlce Copier, identifying in- 
formation is keypunched, and additional information 
is entered by typewriter. An inventory of other forms 
of the visual aid, such as glossy photographs, slides, 


| 
and — is indicated by check marks on a small 


| 
| 
I 
| 
I 
| 
| 
| 


The cataloging, storage, and circulation of visual aids 
is still a rather new responsibility of the Document Li- 


brary of the Applied Physics Laboratory, and the catalog . 


cards that had been designed imitially tended to invite 
such improvements as inevitably follow from second 
and third thoughts. For example, ever since we learned 
of the Time-Life picture eatalog cards (1), which display 
a small reproduction of the picture together with descrip- 
tive information, we were intrigued with the idea of 
including pictures of visual aids on our catalog cards. 
In the Time-Life system the original photograph and 
descriptive copy (prepared with a typewriter that prints 


extra'large characters) are photographed side-by-side by : 


a Recordak camera, and 3x5 cards are printed photo- 
graphically from the microfilm. In principle, this system 
is hig} ly attractive, especially by virtue of its simplicity, 
but for technical reasons it did not quite fill our needs, 
as we shall see presently. 

The visual aids in question are for the most part 
30 x 40 inch drawings mounted on heavy cardboard (but 
can He 15 x 20, 20x 30, or even 30x 35 inch flip charts). 
Each, of these drawings is usually photographed by the 
APL, Photolab and an 8x10 inch glossy picture, a 
2x 2'inch slide, a 34x4 inch slide, and an 8x 10 inch 
VuGraph may be prepared, depending on the originator' 8 
requirements. As added complications, the original vis- 
ual aid may have one or more overlays, and all or part 
of these various forms of back-up material may be in 
black! and white, in color, or both. Further, the original 
^I overlay may be revised several times. The prob- 


1 
i 
i 


raphic Graphics Card Catalog and Computer Index 


form printed on the card. The cards are designed to 
permit updating by additional punching as well as by. 
overpunching previously punched symbols. The punched 
data is used by a computer to prepare indexes to the 
visual aid collection, and the cards themselves ate 
used as a manual card catalog for everyday retrieval 
of visual aids. 


BORIS W. KUVSHINOFF 


Applied Physics Laboratory | 2 
Johns Hopkins Unversity 
Süver Spring, Maryland - 


lem has been, therefore, to find a format in which all of 
this, as well as other information, could be displayed 
economically and in a readable manner on a catalog card. 

Everything to be known about visual aids, and presum- 
ably of use and importance, falls conveniently into two 
categories, as indicated in the following list: 


1 2 
Visual Aid and Inventory 


of Back-Up Material Historical Information and 


Physical Description of Identification of Visual Aid 
Chart Size Title of Chart 
Mounted Chart Originating Group or Proj- 
Flip Chart ect 
2 x 2 Slide Date of Chart 
3x 4 Slide Black and Originator's Number 
VuGraph White or Originator's Name 
Overlay Color — Artist's Name 
Print Presentation for Which 
Glossy Photo Chart Was Prepared: 
Date of Presentation 
To Whom Given 
Where Given 
Security Classification 
Charts Used 


Photolab Negative Number 

Security Classification of 
Chart 

Revisions 
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SLIDES 


CHART SIZE 


[i520] 20 «3| 9x46 





Fra. 1. Form designed to show uius of back-up material and physical description of original visual aid. “G” stands for 
glossy photograph; “P” for Ozalid, Multilith, or Xerox print as applicable; “Y” stands for VuGraph; and “FC” for flip chart, 
which is always 30 X 35 and therefore needs no size breakdown. 


"The solution to the problem posed by the first column 
almost suggested itself, and proved easy to enter, com- 
pact, and readable. The small form reproduced here is 
filled out as necessary and is marvelously simple and 
effective. 


This form takes little space on the eatalog card, and 
yet, by putting X's m the applicable squares, takes care . 


of all the possible combinations that can exist for & visual 
aid and its back-up material. 

The.problem posed by the information in column 2 of 
the list was not so easy to cope with. Titles can be long, 
there may be subtitles, and considerable space can be 
taken up by each of the other items in the list. This 
clearly indicated the need for some shorthand or coding 
system. At this point in the development of the im- 
proved visual aid (VA) system, two factors came into 
play, one & matter of personal preference, the other an 
out-and-out misconception. Were it not for this mis- 
guided good fortune, the trail may not have led where 
it did. 

The original visual aid catalog — of 5x 8 cards, 
which provided for most of the information given in the 
list and allowed for several subjects. To one unfamiliar 
with the format on these cards, the arrangement of the 
information seemed to be not as effective as it might be. 
Being predisposed to a catalog card format in which the 
information is printed high on the card (to facilitate 
easier reading in well-filled drawers), the rearrangement 
tended to reposition the required information upward, 
so that we got a configuration as shown in Fig. 2. The 
misconception was that we thought the negatives made 
by the photolab were 34 x 41 (instead of 4x 5 and 8x 10, 
as they actually are). Mindful of costs, we wanted to 
avoid the expense of rephotographing the entire collec- 
tion. As it turned out, 34x 4 slides are available for a 
large part of the visual aid collection, quite a conven- 


lence for the diazo process, which works most simply . 


with a positive transparency 
By reducing some of the desired information into 
coded form, repositioning it high on the card, allowing 


‘Somewhat surprisingly, black and white slides of line drawings re- 
produce quite well on the Xerox 914 Office Copier. 
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‘space to the right for a 3x4 picture, we were left with 


considerable blank space at the bottom. 

Then, almost by intuition, we realized that if we got 
rid of all the blank space at the bottom of the card, we 
had a shape very similar to that of a punch card (out- 
lined area in the figure). This idea, quite naturally and 
directly, led to all sorts of other ideas. If the catalog 
cards could be made of stiff photographic paper, and if 
the cards could be keypunched and fed to a computer 
for indexing purposes, we would indeed have a versatile 
system. Not only could we have & manual card catalog, 
we could provide our users with & variety of computer- 
printed indexes of the entire collection. 

There were two methods immediately available to us 
for putting pictures on cards. One was Xerography, the 
other photography. Since the APL Photolab already 
had negatives of one size or another, we decided to ex- 
plore the photographic technique first. 

Knowing that typing on photographic emulsion is 
impractical, we wrote to Eastman Kodak? to see if we 
could die-cut, dimensionally stable cards about 6.5 mils 
thick, with photographic emulsion on one half and bare. 
paper on the other. The answer was that this was tech- 
nically feasible, but exorbitant in cost for a small market. 
Dimensional stability is enormously difficult to maintain 


because photographic paper is wetted during develop- 


ment and fixing, and usually is dried by heat. Further- 
more, emulsion and paper have different hygroscopic 
characteristics and different coefficients of expansion. 
This, compounded by core curl that often seta during 
manufacture, pretty much eliminates ordinary photo- 
graphic paper, as we know it, for use.as punch cards. 
One positive benefit of the exchange with Eastman Kodak 
was their suggestion of a typewriter ribbon that would 
"take" on emulsion and the suggestion that a clear acrylic 
spray could be used to eliminate rub-off. 

Meanwhile, approaching the problem from another 
angle, we evolved & photosensitive diazo punch card 
(Fig. 3). To test the idea, we stimulated the inventive- 
ness of two friends? in the Laboratory, and with their 


3 Thanks sre extended to Mr. John P. Eager, Vioe President of 
Recordak, and Mr. H. H. Haggedorn, Product Specialist, for several 
usefal — 

* L. W. Fraser and Henry Jorgensen, Flest Systems Division, API. 
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Fro. 3. Final version of the Graphic Graphics Catalog Card 


space. Black outline shows relative size of a standard tab card. 
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assistance hand-manufactured a few cards to determine 
if they would feed through a keypunch machme and 
subsequently act as inputs to the IBM 7094 Computer. 
Using 102-Z Ozalid paper, a product that had recently 
become available and which is plastic coated on the back, 
we succeeded in making a productive run, even though 
the cards were inadvertently trimmed slightly smaller 
than standard punch cards. 

Briefly, the cards were made as follows: We copied 
a 324 glass slide on diazo film; made a positive trans- 
parency of a paste-up of the standard matter that we 
wanted on each card, and stripped the two pieces of film 
together. We were now in business to print our sample 
cards. 

The test cards appeared much like the final version 
shown in Fig. 3, and despite the fact that special care 
was not taken to achieve quality, everything on the test 
cards was readable. The title and subjects were typed on a 
Synchrotape,* and the typed line across the top of the 
card was produced by the IBM 26 keypunch machine. 
The few holes that appear in the picture are not at all 
troublesome, since the main purpose of the illustration 
is for recognition of the original. 

Before discussing one or two of the more interesting 
features of the system, let us briefly review its auto- 
matic indexing capabilities. Most of the fields on the card, 
shown in Fig. 3, are self-explanatory, but some deserve 
additional comment. 

From the outset, we tried to maintain a capability of 
updating the cards because it would be costly in both 
time and money to have to produce a new set of cards 
just to change a security classification, for example, or 
to indicate a revision. Therefore we used a field of three 
columns for the security classification, the first of these 
for “secret,” the second for “confidential,” and the third 
for “unclassified.” If a secret chart is downgraded, a C 
ig entered in the column next to the S, and if the chart 
is completely declassified, a U is entered next to the C. 
Thus, the symbol farthest to the right on a card is the 
correct classification of the item. The column directly to 
the left of this field is used to indicate the security classi- 
fication downgrading group number, and the column 
directly to. the right is used to enter a symbol indicating 
proprietary information, if applicable. Five columns 
are allowed for the various types of back-up material. 
Thus, if a glossy print or a slide is made up some time 
after the cards have been filed in the catalog, the cards 
can be pulled and additional punches made in the card. 
The only difficulty in reading this field may be in 
distinguishing the first Š (standing for 2x2 slide) from 
the second S (standing for 3ix4 slide). But one need 
only glance down at the inventory form in the lower left 
corner to ascertain which § it is; furthermore, in the 
indexes there will be neighboring entries that will dispel 
any possible confusion. 


tA Synchrotape was used because it had small type, and we needed 
several cards for testing; actually, any typewriter may be used, 
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The Revision and Overlay fields are used in con- 
junction with the VA number field. A separate card 
is made up for each overlay, which is given the same 
number as its associated chart, but with an alphabetical 
suffix to indicate that it is an overlay. An original overlay 
is indicated by — À, its first revision by —B, ete. Re- 
visions of charts are indicated by numerical suffixes. 

Assume that a chart numbered CH 543 is the 123rd 
chart to enter the VA system; the chart is assigned the 
number 123. Now assume that the chart is revised. The 
catalog card numbered 123 is updated by entering —1 
in the REV field, and an asterisk is punched after the 
number 123, i.e., 123*. After being added to the com- 
puter index tape, the card is returned to the catalog to 
represent all of the back-up material that has not been 
revised. Meanwhile, the revised chart numbered CH 543 
(now VÀ 123-1) is photographed by the photolab, and & 
new card is made up by the Document Library. This 
new. card is filled out and keypunched as usual and repro- 
duced. The keypunched card is added to the master index 
tape and then filed. We now have two sets of cards re- 
lated to one basic chart, one showing the way it was— 
the other showing the way it ts. 

Now, if the same chart is revised again, the new ver- 
sion is numbered 123-2, and a new card is made up and 
processed as before. Meanwhile, the old cards are up- 
dated by overpunching the hyphen in the REV field 
of the first card with an asterisk and doing the same in 
the VA: field of the second card, and adding —2 in 
the REV field of the second card. ‘These steps are 
shown in sequence in Fig. 4. 

Thus, the hyphen in the VÀ field indicates that the 
version of the original visual aid as displayed on the card 
stil exists, and an asterisk indicates that it has been 
revised or intentionally destroyed. The hyphen and as- 
terisk are used because cards do not have to be remade; 
an asterisk can be punched over a hyphen, thereby up- 
dating the card and indexes by a single stroke. 

The OV (overlay) field is used in exactly the same 
way as the REV field, except that letters —B, —C, 
— D, ete. are used instead of digits. Complete sets of 
cards are made up for overlays just as for the charts 
themselves, the original overlay being numbered, for 
example, 123-A, simply to indicate that it is an overlay 
and to distinguish it from a chart. 

The last two fields on the card are reserved for the 
originator. He is free to use them in any way he wishes 
to arrange his particular set of visuals in a systematic 
manner. As long as he uses a system, his set of visual 
aids will be grouped together in the index. A sample of 
the VÀ index is shown in Fig. 5. 

As currently planned, there will be four indexes: (1) 
by VÀ number; (2) by negative number; (3) by origi- 
nators designation; and (4) by originators number. 
After a master card is made up, filled in, and keypunched, 
five copies will be reproduced by Xerox on Xerograkards: 
three or four will be filed by subject, one will be presented 
to the originator for his use, and the master will be filed 


CHART IST REVISION ` 2ND REVISION 
ENTERS SYSTEM ENTERS SYSTEM ENTERS SYSTEM 


123* 
VA NO 









4 š 
PREPARED * AND - 1 ENTERED 


CARD ^ 









CARD 2 


PREPARED F2- CHANGED TO * AND —2 ADDED 
d 3 OS |, B ! 
f d B l 


— | "E . sel 
0 | VA NO REV 
a | b s 
E | 3 
O | | ^ PREPARED 
prr ! PONES — — 








Fic. 4. Sequence of steps for updating revisions 








alc h s EM M CI Md C mM MCI CM M CC E E ER 
| ` e PEN 4 ; " | 
| VETT LIE wes ^x | ree 

b e TEA SD Lk . j 
¿ "AS ae! 4 * 4 + à E < | 
i y Er P ; *. $ I 

1 e€ ' i d 1 ` * 
| x ~ vss z t , i 
RAW Lui pon ove ` 7 | 

7 — vt, & Á $ k tos. ? z 7 m. 
| HESS y q A E Hj 
| : "s E AT pa l - 

E Š ek * ! i > : 
|^ i | : | 
i — * P = 1g 
. I VISUAL AID INDEX | (^. E 
a BY VÀ NUMBER | ft. 5. | 

. NS zl l p 
| . pr a | VA NC REY OV DATE CLASS BW NEG COL NEG SSGPV PR ORIG NO ORIG DESIG Si | 
x WU E * t y "n 
ILE aA i 12/63 U 64784 SGPV 02 APL~ 101  CLO-REG Eon | 
Ros — Š 291 -2 12/63 C 49959 $S V APL- 115 FSO-t WF ` * » 
bs 3 16/61 C 44842 SSG V APL- ISOK TEO-JFRF i E ei 
6 11/58 y 33779 SG Y APL- 157À  BPD—-JHMW i » xx f 
ik | 59 05/50 € 43887 SG Y APL- 698A TTO Meer 
sal 98 08/62 U $3558 SSG V 3  SSD-MA5 { CARRIE 
| 239 03/41 C 40694 SG Y CLA 849 1 CLA l — PE 
349 07/52 U 53181 $2218 SSG Y CH 16 TWO | T UNE RE 
i 1339 07/62 ü 55219 SSG Y Ol CH 164 TMO 1 x À NO 
| 1436 12/41 U 69210 S G 468 ADD-EDK J uh x 
š l ant e a Es A 
i — — 
| | | ü s 882. Na Roe 
l Saa aaa — — ae M er uama at aera 
i 4 ES x r `. | : RN » ds I < MES 
et AES AD. a acd NE 
{ e — i ^ > P 
. Fs ^-^ ^" 
; V ll m i = (T. sr i «e - 
: | SAMS IRIDEX — 
PEU "x M I N > I . š * oe ¿e 

- — poor) f I “>, u 

1 24 , E M x 








Fic. 5. Sample of visual aids index 


E 


Ámerican Documentation — January 1967 í 


by VÀ number after it has been passed through the com- 
puter for the indexes. 

One feature of these cards is especially appealing from 
the user's standpoint. He need not. use the cumbersome 
originals or even slides, VuGraphs, or glossy prints to 
organize a presentation. Transparencies must be viewed 
by transmitted light, and 8x10 glossy prints are too 
large to spread out in any number for comfortable 
simultaneous examination. The cards, on the other hand, 
are not only compact, but carry all of the information 
to be known about the visual aid, in addition to a 
small picture of the original itself. A staff member 
preparing a presentation’ can spread out as many as 
30 to 40 of these cards on his desk, make his selection, 
choose the sequence of presentation, and then send the 
cards, in the order desired, to the Document Library. 
The Document Library can than arrange slides or 
VuGrapbhs in the same order, place them in a carrying 
box, and charge them out to the requester. _ 

At the presentation, the staff member could con- 


ceivably use the cards as cue cards and might even be | 


able to get along without a typed speech. In any event, 
he will know exactly what is coming up next, what it 
looks like, what the title is, ete., by flipping the cards 
one by one as he proceeds through his talk. He need 
not crane his neck to see the screen while he talks. 

The storage and circulation system is the essence 
of simplicity. The backs of the cards are printed 
as shown in Fig. 6. The charts themselves are stored 
in several slotted cabinets labeled A, B, C, etc. The 
slots in each cabinet are numbered 1, 2,...., 30. The 
shelf location of each chart is indicated by the appro- 
priate combination of cabinet letter and slot number. 

The cabinet slots are provided with rollers, and the 
edges of the charts (stored back to back, two to a 
slot) are protected with tight-fitting metal channels 
running along the 40-inch edges. Since the charts are 
stored back to back, it is easy to determine which of 
the two is desired, simiply by looking at them. If a 
chart is circulated, the borrowers name is written 
on the back of the card, thereby becoming a permanent 
record. 

The last problem involving circulation was solved 
so easily that we have forgotten why we thought it 
serious. If someone requests charts, slides, and VuGraphs 


SHELF 
NUMBER CHARGED TO 


ate AA 






Fra. 6. Charge-out and storage location record 
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mman 


at one time and returns them at different times, how 
do we check these items in and out quickly without a 
lot of writing and record-keeping? Our solution, cer- 
tainly not the only one, is to slip the appropriate card ` 
of the VA file into a transparent sleeve. A marking 
pencil is used to circle the appropriate boxes of the 
inventory form. When some items are returned but 
others not, the circle around the returned item is wiped 
off or seribbled out. When everything is returned, the 
card is removed from the sleeve and the sleeve is dis- 
carded (if it is disposable), or wiped clean and used 
again (if it is of sturdy acetate). These sleeves cause 
the card to stand up a fraction of an inch above the 
other cards in the file, which is a handy feature in itself. 

We have already proved that the diazo process can 
be used. Two factors make it especially attractive: 
the 102-Z Ozalid paper appears to have the dimensional 
stability and flatness to serve for punch cards even 
after rather abusive handling. Material as well as equip- 
ment costs are about as moderate ‘as one could possibly 
hope for, and quality of the end-product is very good, 
especially if halftones are involved. Although we are 
testing other photosensitive papers, the plasticized diazo 
paper commercially available seems best able to satisfy 
all minimum requirements of our system. 

The greatest drawback of the diazo process in this 
application is its slowness. Xerograkards, which are 
perforated punch cards, four to a sheet, designed for use 
with the 914 Xerox Office Copier, require much less time 
to produce. 

With these cards we found we need to type, punch, 
and print only one card, and reproduce six copies of four 


- different cards at a dine in a little over half a minute, as 


compared to something like 20 minutes by diazo. ` 

The necessary copies of the cards can be produced 
in various ways, but the first step in each case is to 
make a reproducible master card that has all the infor- 
mation on it, including the picture. Two methods appear 
practical in our case: 


1. We offset-print a quantity of perforated sheets 
with stan material (including the charge-out 
form on the back). To produce the master cards, 
we take photographie prints of the right size, posi- 
tion them in & Jig, and Xerox one copy on the 
preprinted sheets. Since there are four cards to a 
sheet, we can make up cards for i charts at ` 
once. These cards are torn apart, ed, key- 
punched, marked as necessary, and re- od five 
times on plain sheets for the remaining cards 
needed. Since this means Xeroxing a Xerox copy, 
we have to be content with background and haze 
on the cards. Also, very small type is just barely 
readable by magnifying glass. 

2. The second method is to produce the master card 
on diazo paper. The quality of a diazo master card 
js considerably higher in comparison with Xerox, 
and better quality Xerox copies result. Further- 
more, halftone images and images with large solid 


č Trademark of Busiforms, P. O. Box 84085, San Francisco, Calif. 
94134, 


[areas come up much better on diazo. It must be 

noted, however, that the diazo process is tedious 
oe time-consuming as compared to the Xerox 
me 


Anyone who has reached this point and remembers 
that we have not provided for the illustrator’s name 
on the card, must be satisfied with this explanation: 

ve passed this responsibility to the originator, who 
has two 10-column fields in which to put this information 
if he so desires. The illustrator’s name is sometimes quite 
important, for having done the original work, he requires 
much! less instruction and information to make quick 
changes in updating the visual aid. 

Other applications of the graphic punch card have 
been explored in a superficial way. The system could 
certainly find use in picture libraries, in art museums, 
and possibly in newspaper libraries. These cards are 
readily machine sortable. There may be very real appli- 
cations for them in medical research (2) and perhaps 
even in criminology in connection with fingerprint identi- 
fication and the “mug” file. It would be extremely 
interesting to see if graphic tab cards could be used in 
identifyin g and classifying satellite weather photographs. 


ma having access to a computer with video and 
optical capabilities might find it interesting to explore 
E °P sibility of coating the back or edge of a graphic 

magnetic oxide. Graphic images and other 


— of data could be stored magnetically on such 
cards. | These cards could be processed by magnetic 


E. 


reading heads as well as by optical scanners. One likely 
application would seem to be for parts catalogs (3). De- 
scriptive material and pictures of the parts shown on 
the card could be used visually by editors. The cards 
could then be fed to an automatic composing machine. 

Estimated cost for programming and for IBM 7094 
computer time for the preparation of four indexes for 
a collection of 5000 visual aids is $900. Printing 30 
copies of a 300-page index is estimated at $60, and the 
cost of the perforated Xerograkards is $376 (10 M sheets, 
40,000 cards). 

The over-all cost is thus less than $1,400. Adding 
approximately $500 for Xeroxing and miscellaneous 
items, we should be operational with this collection for 
about $2,000, or 40 cents per visual aid. Salaries and 
overhead are not included. These rough figures are 
provided merely to give a general idea of what such 
e system might cost. 
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The Use of Second Order Descriptors for Document Retrieval * 


t is proposed that one way to increase the efficacy 
of document retrieval is to define to the computer the 
descriptors used to index the file. A computer pro- 
gram written in COMIT to implement the proposal and 
‘o facilitate testing its capabilities is described. Def- 
nitions are given to the computer as a string of terms 
called a ‘‘definitor."’ These terms, which act as "second 
order descriptors,” are not normally those used as file 
descriptors. Their introduction provides a controllably 
oroader base for link-finding and matchcounting oper- 
ations by the computer. It also makes possible such 
hings as introducing new terminology and biasing 
existing descriptor indexes towards special interests or 
anguages without having to re-index the file. The pro- 
jram computes a '"pseudometric distance" between a 
juery and each document and prints an ordered list 


» I. Introduction 


It is general practice today to use strings of. natural 
anguage terms to represent to computers the conceptual 
:ontents of documents. This is called “coordinate in- 
lexing" and the terms used, whether single words or 
imple phrases, are usually called “descriptors.” Al- 
:bough the debate continues as to the advantages and 
lisadvantages of coordinate indexing compared to other 
sechniques, the fact is that essentially the only access 
soday to the conceptual content of a truly large number 
X, if not most, documents of current scientific and 
echnical interest is by means of their assigned descrip- 
ors. If access by automatic means is stipulated, then 
these descriptors constitute almost the sole entry to 
chat literature in machine-readable form. Not only 


* A condensed presentation of this paper was given to the Sist Con- 
zresa of the International Federation for Documentation in Washington, 
D. O., on October 14, 1065, 
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of those documents closer to the query than some 
chosen cut-off value. (Large filles would probably re- 
quire some preselection, such as that which would re- 
sult from use of a concordance.) It then substitutes for 
each descriptor its deflnitor and repeats the above 
process. The result is that the subjective human judg- 
ment required to evaluate the efficacy of introducing 
the definitors is reduced to a statement as to which list 
would be considered more useful. Use of the program 
to date has been only as a demonstration so no con- 
clusions can be stated other than that the demonstra- 
tion results would seem to indicate that testing on a 
serious scale should be undertaken. (This paper is a 
result of work sponsored by The MITRE Corporation's 
Educational Assistance Program.) 


MILES A. LIBBEY 


Director 
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is a large part of the literature already entrusted to 
descriptors for safekeeping, the rate of addition to this 
store is increasing both numerically and proportionally. 
Considering the vast investment of resources (including 
but not limited to money) in the research and develop- 
ment results so stored and the vast investment in cur- 
rent R € D in progress which need to be related to 
that store, it is an urgent matter to do anything within 
reason that would offer some hope of wringing greater 
utility from those descriptors which have already been 
assigned. This is in addition, of course, to continuing 
our efforts to learn how to do better indexing in the 
first place. The proposal I am making concerns itself 
directly only with getting automated information re- 
trieval systems to make better use of descriptors that 
have already been assigned. However, I feel sure that 
it would have implications for the descriptor assignment 
process in any information system which was using 
it, and would interact therewith in a beneficial and 
complementary fashion. 


e 2, Fundamental Considerations 

Tid rationale for my proposal requires some con- 
sideration of fundamental principles of coordinate in- 
dexing! I assume that it is generally accepted that some 
legree| of terminology control is required «for good 
coordinate indexing. B. C. Vickery has neatly sum- 
marized the general view as to how this is done as follows: 
“The control of terms for use as descriptors is essentially 
a matter of establishing relations between words.” (1) 

This jstatement reflects the viewpoint, apparently now 
universally held, that an important — even characteristic 
— part|of the process of indexing a document is deciding 
for all time just what aspects of its substantive contents 
may some day-be of interest. (I use the word “docu- 
ment” its broadest sense.) Now, obviously, if a 
document is about guns and butter, and the indexer only 
mentions guns, the fact that it was also about butter will 
be lost to posterity as far as access through that index is 
concerned. But it is usually assumed that such decisions 
wil be made correctly. What the bulk of the literature 
on the specifics of coordinate indexing (and indexing in 
general) lis about is how to decide, during the indexing 
process, what relations to show and how to show them. 
In general this is reflected in Mr. Vickery’s statement 
and the ¡term “establishing” furthers a connotation of 
finality in the fixing. of some decision or the consumma- 
tion of some act. _ 

But what would happen if, in the indexing process, a 
document on poodles did not get indexed also by the 
term “dog?” Would this mean that the chance to use the 
generic relation of “dog” to “poodle” has been lost for- 
ever? There seems to be no reason, at least in principle, 
why it shbuld. Regardless of the acts of any indexer, a 
poodie is still a kind of a dog and will stay such. There 
are other terms that “poodle” implies — or could accord- 
ing to onejs interests imply: “animal,” “pet,” “a canned 
dog food," “clipping parlor,” etc. Presumably, a human 
acting as an intermediary between a “user” and an 
information store knows these things and makes use of 
them, either consciously or subconsciously, in deciding 
how to approach the information store to best help the 
user. That is, the chasm between the terms in which the 
user expresses his request or query and the terms in 
which the contents of the information store is expressed 
is bridged by the capability of the human intermediary 
to relate these sets of terms through his knowledge of 
their meaning. The principal component of this knowl- 


edge is ly derived from the. characteristics of a 
natural language, both general and specific, and in the 
ability of & native speaker to exploit them, both con- 


sciously and subconsciously. Other components undoubt- 
edly derive from knowledge of the information store 
itself, of the subject matter, and perhaps of the personal 
needs or viewpoints of the user. 

But the time is about. here when direct access by users 
(which may some cases be other automated systems! ) 
is the order of the day. Is there a way to include in the 


automated system any effective substitute for the knowl- 
edge of meaning previously supplied by the human inter- 
mediary? I think there is. 

The relations between words that we need to use in 
the process of retrieving desired information from the 
store are not relations that some indexer might— or 
might not — have “established.” They are, rather, rela- 
tions which exist intrinsically, inherently, and potentially 
in terms by virtue of their being part of a natural 
language! The important implication of this is that 
to the extent that descriptors are used in their natural 
language sense, those relations, being inherent in them, 
remain just as.available for selection and exploitation 
during the retrieval process as they are for the origmal 
indexing process. For example, “poodle” still implies 
“dog.” Furthermore, since there is no reason for limiting 
ourselves to one or two relations of special concern, we 
can, in principle at least, list relations in sufficient num- 
ber and detail, and, if we choose, in such a holistic fashion, 
as to effectively approach the substantive or referential 


effect of a definition, 1.e., of an indication of meaning. 


The problem then becomes one of practicability. How 
can this “inherency” of relations be utilized to enable the 
selection and exploitation during the retrieval process of 
relations of special interest or to provide some effective 
substitute for that knowledge of meaning heretofore con- 
tributed by a human intermediary, or, better yet, both? 
My proposal for doing this combines the complementary 
powers of what I call a "definitor" with those of a 
quantifying and normalizing function such as a “pseu- 
dometric.” These are explained in the next two sections. 


e 3. The Definitor 


By a “definitor” I mean a string of terms that serves 
the computer as a surrogate for a definition, defining to . 
it the descriptors used to index the file. The definitor 
provides the mechanism for making any of those rela- 
tions that are inherent in terms explicit and accessible, 
hence manipulable. It simultaneously provides a vehicle 
for carrying in some useful sense information about the 
meaning of those terms entering into the retrieval opera- 
tion, previously provided by the human intermediary. 

Each constituent term of the definitor will, as I cur- 
rently envision it, be a single term, symbol, or simple 
phrase. Taken together, as a definitor, these terms are 
used to characterize the meaning of a file descriptor in 
much the same manner as the string of file descriptors is 
used to characterize the substantive contents of the docu- 
ment they were assigned to index. They could be used to 
retrieve, relate, and compare terms just as the file de- 
scriptors are used to retrieve, relate, and compare docu- 
ments, and in fact I have so used them. (2) For the 
present purposes, however, definitors are used to replace 
the file descriptors in the “description” of a document. 
Such replacement results in the definitor constituents 
assuming a new role of acting directly as new document 
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descriptors. I call these “second order descriptors” to 
refer to this role and yet retain the distinction between 
these and the original file descriptors. Correspondingly, 
I sometimes refer to the file descriptors as “first order 
deseriptors" when comparisons are being discussed. 

These second order descriptors usually would not be 
those used as file descriptors. As far as I can see, how- 
ever, it will always be advantageous to include a repeti- 
tion of the file descriptor as one of the constituents of its 
own definitor. As will be seen later, we will be looking 
for matches, so why throw away possibilities for direct 
matches? 

Second order deseriptors would not necessarily be 
taken from any kind of a terminology control list. 1 
would expect that in practice most of them will come 
from various sources, most of which could be construed 
as terminology control lists in some sense. This will be 
clear from the examples below. 

A definitor, as 1 think of it, would not have to be 
formated, but it probably would be. By “formated” I 
mean that certain positions (or groups of positions) in 
the definitor would be dedicated to terms of some partic- 
ular type or source. Such formating can be used to 
provide additional semantic information or to introduce 
a syntactic structure into the definitor itself. 

If definitors become widely used, whether for docu- 
ment retrieval or for other applications where semantic 
factors have to be automated, their design and construc- 
tion will undoubtedly become a subject for study in its 
own right. In general, I would expect that each applica- 
tion would require a different technique. In any event, 
the samples I give here are not meant as models for 
definitors. In fact, only the last of these, the one for 
OAK, even begins to indicate the concept of a definitor 
on which most of my comments in this paper are based. 
The first four samples are discussed only because they are 
representative of the definitors used in getting the test 
results reported later in this paper. 

The first two are taken from the dictionary used in the 
first of two test runs on a computer: 


DETECTION = DETECTION + GDDF + 
SENSORS + SEARCH 

RADAR SIGNALS = RADAR SIGNALS + ELT + 
ELECTROMAGNETIC + WAVES + 
DETECTION 


In these the + sign is used (as it is in the COMIT pro- 
gramming language) to separate constituents. “GDDF” 
is an abbreviation for “GENERAL DETECTION AND 
DIRECTION FINDING” and “ELT” is an abbrevia- 
tion for “ELECTRONICS.” These are formated accord- 
ing to: 

T=—T+G+A+B 


where T stands first for the first order descriptor being 
“defined” and then, on the right of the = sign, for its 
repetition as one of the second order descriptors com- 
prising the definitor. G stands for a generic term taken 
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from the middle one of three hierarchical levels of the 
terminology control list used by the MITRE Library. 
It, therefore, was in a first order generic relation to the 
descriptor being defined since the actual file descriptors 
comprised the lowest of the three levels. The A and B 
stand for additional terms. These were assigned quite 
freely — in fact mostly while I was sitting at a keypunch 
— and without referring to the test corpus of documents 
or to other definitors. | 

These are about the minimum I would consider as 
still retaining the flavor of a holistic definition-surrogate 
in any sense. For while some dictionary definitions do 
consist of a single synonymous term, this assumes the 
associative/cognitive powers of a human intellect to 
therefrom “understand” a meaning. In principle, this 
could presumably be mechanized by sufficiently iterating 
the dictionary look-up procedure, but such iterations are 
not immediately contemplated here. In these minimal 
definitors the principal — and only systematic — associa- 
tion-making power is provided by the generic terms. 
DOPPLER SYSTEMS, another first order descriptor 
used in the test corpus, was classified by the terminology 
control list in the same group as DETECTION, there- 
fore would have GDDF for its generic definitor con- 
stituent, as did DETECTION. Thus, the computer is 
enabled to “know” that there is some relation between 
DOPPLER SYSTEMS and DETECTION in terms of 
some aspect of their meanings to humans. Not only is 
the information that some relation exists between these 
two terms thus made available to the computer, but in 
this particular example additional information as to the 
kind of relation is gratuitously available because of the 
formating; 1.e., that they both stand in the same specific/ 
generic relation to some term (which can be identified if 
need be). 

Far more mformation than just exemplified can be 
provided by definitor systems that are properly designed, 
implemented, and utilized. Before leaving the sample 
definitors above, one further instance can be noted. In 
this case, a nonsystematic relation between DETEC- 
TION and RADAR SIGNALS is made available to the 
computer by virtue of the appearance of DETECTION 
as the fourth definitor constituent for RADAR SIG- 
NALS. Now, it was my intention — and to the best of 
my knowledge it was carried out — to assign the last two 
definitor constituents in this particular set quite freely 
(not, however, amounting to what a psychologist would 
call "free association") especially making it & point not 
to refer either to the test documents or to the other 
definitors. It can, therefore, be considered that the assign- 
ment of DETECTION to RADAR SIGNALS, hence the 
relation made available to the computer by this assign- 
ment, was fortuitous. I feel that the appearance of rela- 
tions in this manner 18 one of the virtues, rather than one 
of the vices, of definitors, and that part of whatever 
power they give the computer to emulate the human's 
intermediacy will derive therefrom. 


The second two definitor samples are taken from the 
ictionary used in the second of the two test runs: 


DETECTION = DETECTION + GENERAL DETEC- 
RS AND DIRECTION FINDING + DETEC- 
TORS + SENSORS I 


RADAR EQUIPMENT = RADAR EQUIPMENT + 
RADAR AND RADIO DETECTION 4- 
SENSORS+ ELECTRONIC EQUIPMENT 


‘he format for these is: 
T —T 4- G+ A+ BCG-C) G-D) (+E) 


"his is the same as before except that here up to three 
dditional free terms were optional, as indicated by the 
erms in parentheses. As before, definitors were con- 
tructed |without reference between themselves or to the 
ocuments. However, the participation of .a librarian 
asured thore care in their construction. This undoubtedly 
lso did introduce some knowledge of the nature of the 
ontents ¡of the test file, but this was unintentional. (I 
ont m to imply that, normally, such knowledge 
hould net be intentionally introduced — here, the inten- 
ion was to reduce the number of parameters that would 
ave to outguessed in the appraisal, preferably by 
rring on|the side of disadvantage to performance.) The 
wo factors, more terms and greater care, did appear to 
how a ddfinite increase in power. SENSORS relates. the 
wo above nonsystematically.  Formalizing definitors 
long such lines as the next example would make such 
elating less & matter of chance. | 

As & final sample, the following definitor was not used 
1 the work reported in this paper, but is taken from the 
ietiongry,I constructed for a follow-on study: (2) 


OAK = OAK + FOUR + 41011 -+ 4ITIC + 410 + 410.51 + 
WOOD + TREE + BROADLEAF + 
LARGE + ACORN + NUT + NOUN + 
CONCRETE + FAGACEOUS + — + — + 
DECIDUOUS 


“he format for this one can be represented as: 


T =T -IRI -+ R2 + R3 + R4 -+ R5 + F1 + E2 + D1 + 
D2 -H D3 +-D4 + G1 + G2 + M1 + M2 + M3 + 
M4 


vhere T is the same as before, the R terms are from 
toget's Thesaurus, the F's from The Golden Encyclo- 
edia, a children’s book, the D’s from Websters New 
¿ollegiate Dictionary, the G’s are parts of speech, and 
he M’s are for free terms. I had hoped to take addi- 
ional constituents from a faceted classification scheme 
ut time did not permit. This definitor is shown here 
mly in the [interest of bettering communications: some- 


hing more jlike it is what I really have in mind when - 


alking about definitors unless otherwise indicated. 


^ 4. The — 


To simply, replace file deseriptors by definitors similar 


o the last ¡one shown would be disastrous in that it- 


would result in & tremendous number of documents of 
little or no relevance being retrieved. In trying to escape 
the devil of paucity and sterility of word associations 
based on first-order descriptors we would drown in the 
deep blue sea of too many and’ too tenuous word associa- 
tions based on second-order descriptors. Such an increase 
in “false drops” has no doubt discouraged many attempts 
to increase the richness of word associations in the 


' retrieval process. Lancaster and Mills stated the gener- 


ally held belief, “Devices such as confounding of word 
forms or generic searching may, by enlarging the classes 
to.be searched, improve recall. But they will do this only 
at the expense of relevance.” (3) 

Fortunately, quantifying and normalizing functions are 
available which simply thrive on such multiplicities of 
associations. The kind of functions I have in mind are, 
in essence, statistical “mechanisms.” As such, the larger 
the “sample” they have to work from, the better they 
work. In fact, I suspect that the main reason for their 
not having been much more effective to date is just that 
it has heretofore been impracticable to “feed” them with 


: large enough “samples.” This need can be satisfied by the 


definitor in an intuitively meaningful way. This, in turn, 
enables the quantifying and normalizing function to 
return the favor by satisfying the need created by the 
definitor for (a) a means to control the quantity of 
items retrieved and (b) a means to ensure that, whatever 
quantity is chosen, it will include the most relevant items, 
i.e., a means of ordering the items according to relevance 
to the retrieval request. 

The particular function I use is the “pseudometric” 1 
introduced and diseussed by Rial (4), though such func- 
tions as those of Stiles, Maron, and others discussed by 
Hayes (5) may be able to be used. This calculates & 
"distance" in a “concept space" between one string of 
descriptors, say those expressing a query or request to 
the information store, and another string of descriptors, 
say those of any of the documents in the store. The 
formula is simply 





_4_14 N B| 
Es JA U B| 


where F stands for the “pseudometric distance,” A and B 
represent the two descriptor sets, “N” is the symbol 
used in logic for the “meet” or “intersection” of two sets, 
“U” is the symbol for the “union” of two sets, and the 
pair of vertieal lines is used to denote that the value 
expressed is the cardinality (number of things) and not 
the actual indicated things themselves. Thus, the nu- 
merator of the fraction part is the number of descriptors 
the two strings have in common and the denominator is 
the total number of different descriptors in the two 
strings. 


` 1 The reason that this is referred to as a “pseudometrio”” rather than 
a “metric” jg that while it meets two of the three requirements for a 
metric (the symmetry requirement, (x, y) — 2(y, x) and the triangle 
inequality, 8(x,y) + 8(y,x) = 0(x,zr)) the zero-distanee requirement for 
a “metric”? (S(x,y) —0 if and only if x =y} is relaxed to allow 
8(x, y) = 0 even if x % y. 
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All “distances” fall into a range of zero to one regard- 
less of the number of descriptors in either string, thus 
providing normalization. Subtracting the fraction part 
from unity has the incidental but happy result of making 
the calculated results compatible with our intuitive feel- 
ing for distances. That is, a distance of “zero” shows 
maximum conceptual closeness (when the two strings 
are identical) and a distance of "one" shows complete 
irrelevance (when the two strings are entirely different). 

The way the pseudometrie works — and the way it 
complements the definitor — will be clear from the sam- 
ple calculations to be given. First, a few comments about 
my choice of this particular function: most important is 
that it is the simplest function I know of which satisfies 
the needs introduced by the definitor. Also it is intuitively 
comfortable to work with. It does, however, have weak- 
nesses. The most noteworthy of these may be illustrated 
by the following case: 


Decuments Descriptors 
À xe 
B j,k.1.m 


then 
F—1 — 75 
4 


Since the subject of document A is indicated as being 
completely encompassed by the subject matter of docu- 
ment B, to say they are so far apart seems questionable. 
Yet, on the other hand the added subject matter of B 
makes it unlike À. Would we want to consider using some 
funetion that would show À as being closer to B than 
B is to A? Or would we be able to develop a correction 
factor for such cases, where the string lengths are dif- 
ferent? I don’t know, but I do believe that if we remem- 
ber that the problem posed here is a psycholinguistic one 
rather than an arithmetical or logical one, we'll be able 
to handle it even if different means are needed in different 
environments. 


* 5. The Procedure 


To help follow the calculations and discussions in the 
ensuing sections, the operational retrieval procedure I am 
proposing is outlined here in its barest essentials. I 
assume that we are talking about fully automated re- 
trieval. For simplicity at the moment I assume that 
queries to the system are expressed in terms of a string 
of descriptors for each of which there is a definitor in the 
dictionary of the same kind as those used to define the 
document descriptors used in the system. 

First, each query descriptor would be replaced by its 
definitor by means of an automatic dictionary look-up. 
Then, in turn, each document in the file would have its 
descriptors replaced by their definitors and a pseu- 
dometric distance between it and the query would be 
calculated on the basis of the two strings of second-order 
descriptors. Then each pseudometric distance so calcu- 
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lated would be compared to a preset cut-off value 
(which I assume could be changed at will between runs). 
All documents whose pseudometric distance from the 
query was greater than the cut-off value would be dis- 
carded. Those remaining would be ordered according to 
their distances, those with the least distance first, and 
presented, together with the calculated distances, as the 
retrieval result. 

The foregoing is intended only as an outline. By “pre- 
senting” a “document” in the output I mean anything 
from an identifying symbol to the document itself — it’s 
the selection process we're concerned with. "Each docu- 
ment" could be “each document that had survived some 
preselection routine, such as taking the query definitor 
constituents through a concordance.” The cut-off value 
could as well be applied to each pseudometric distance 
as soon as it was calculated. And obviously, if it was 
desired to buy time by paying storage space, the substi- 
tution of definitors for the document descriptors could be 
done when the document (representation) was stored. 
Of course, then the file (rather than just one entry in the 
dictionary) would have to be updated whenever a defini- 
tor was changed in any way. 


* 6. Feasibility Demonstration 


As a feasibility test of the proposed procedure, I wrote 
& computer program in the COMIT programming lan- 
guage? and used it to make one test run of five queries 
against a test corpus of 41 document representations and 
another test run of three different queries against a dif- 
ferent test corpus of 32 document representations. The 
way in which this program works is illustrated in some 
detail to give a better picture of the nature of the pro- 
posal. The fact that the program exists and ran success- 
fully merely proves that it is indeed feasible to construct 
a program which will implement the proposed procedure 
and that it will in fact operate on a computer as planned. 
It says nothing about the validity or the worth of the 
procedure itself. 

To provide a basis with which to compare the retrieval 
results achieved by using the second-order descriptors, 
the program first calculates the pseudometric distance 
between a query and a document using only the regular 
file (first order) descriptors. It should be realized that 
this would not be done in an operational environment; 
it was done simply for evaluation purposes. For ease of 
reference we will call the distance so calculated the 
“first-order distance" to distinguish it from the “second- 
order distance” resulting from use of the second-order 
descriptors. Lu 

Figure 1 shows how the first-order distance was cal- 
culated. The query and document used in the figure are 
taken from the second test run, so the calculation shown 
18 one of those actually done by the computer. The query 


2 Developed by V. H. Yngve at MIT (8). 
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Fic. 1. Calculation of first-order distance 


scriptors are brought in from storage and counted to 
art off the count of different descriptors. Then the 
ieument deseriptors are brought in from storage one 
, 2 time and checked against those of the query. Here, 
e first one, DIGITAL COMPUTERS does not find & 
atch so a count is added to the count of different 
iseriptors. (To find & match, of course, the computer 
ust find a perfect character-by-character coincidence 
‘tween the two terms.) The next, PROGRAMMING, 
ids a match so a count of one is recorded to start the 
unt of “same” descriptors. The last, LANGUAGE, 
so finds a match adding another “same” count. The 
'st-order pseudometrie distance is therefore 

| 


| F=1— No. same —1— 2 —1 _ 5000 — 5009 


No. different _ 4 





his is recorded in temporary storage and the program 
'oceeds to its principal task, the calculation of the 
cond-order distance. 

Figure 2 shows schematically how the two strings of 
cond-order descriptors are formed by replacing each 
scriptor in the query and in the document with its 
fimtor. (Actually once the string of second-order de- 


riptors for the query has been formed for use against 
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Fra. 2. Formation of second-order descriptors 


the first document it is saved for use against succeeding 
documents, eliminating pointless repetitions.) 

Figure 3 shows how the second-order distance was 
calculated. As before, the count of “different descriptors” 
is started off by counting all the query descriptors, then 
a one-by-one check of the document descriptors against 
the query descriptors is made. 

To clarify what is happening, the first term in each 
definitor (which, it will be remembered was just a 
repetition of the term being defined) is underlined. Note 
that COMPUTERS appears twice in the query’s string, 
once as the repetition term in the definitor for COM- 
PUTERS and then as the second term in the definitor 
for PROGRAMMING (since it was the term generic to 
PROGRAMMING in the terminology control list used). 
As the program works at present, these two occurrences 
of COMPUTERS are each counted as an additional 
“different descriptor” in the first count of the query 
descriptors. Furthermore, note that COMPUTER also 
appears twice in the document’s string. Each of these, 
when being checked against the query descriptors finds 


. & match immediately and the count of “same” descriptors 


is increased by one for each of them. The check stops, 
for each term, when a match is found so that the further 
match possibility js not found. The matter of how such 
repetitions in either string should be handled needs 
further study, especially since it would seem intuitively 
that just such repetitions would be particularly signifi- 
cant as indications of conceptual closeness. The way they 
were handled here tends to yield a smaller distance in 
such cases, as seems proper, but if the second occurrence 
of COMPUTER in the query string had not appeared 
there but instead had been a third occurrence in ihe 
document string the count would appear to indicate a 
logical absurdity, i.e. that the set intersection was 
greater than the set union. The distance would have 
come out 


1-2 or 1 — 1.0714 or — 0714 


Perhaps we should accept such a result as a signal of 
"extra closeness!” 
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Fia. 3: Calculation of second-order distance 
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By so defining the quantifying and normalizing func- 
tion as to permit it to range into the negative real num- 
bers as well as the non-negative real numbers, with 
ordering still done algebraically (e.g., —3 < —2), it seems 
possible that greater latitude is made available for com- 
parisons of close “distances” occurring in some such 
manner as that Just indicated above. Such latitude just 
might make an iteration procedure more worthwhile than 
14 would be otherwise, for example. 

To return to the computation, the second-order dis- 
tance for this query-document pair came out to be .0667, 
much more as, intuitively, it should be. The program 
then records this in temporary storage and proceeds to 
calculate the two distances between the same query and 
the next document in the file. The results of applying the 
cut-off value to the resulting first- and second-order 
distance lists, and of then ordering, labeling, and printing 
out those left is shown in the next section. 


* 7. Results 


The results obtained from the two test runs are pre- 
sented in Tables 1 through 8. The ordered lists and asso- 
ciated pseudometric distances are copied directly from 
the computer print-out. The figures in the single digit 
columns give the relevance of each document to the query 
as judged by a professional librarian. A “0” indicates 
“complete relevance” and signifies that, to the satis- 
faction of the judge, the document would have given a 
sufficient answer to the query all by itself. A “1” repre- 
sents nearly complete relevance in the sense that the 
judge considered that the document almost — but not 
quite —- answers the query by itself. A “2” indicates that 
the document was still considered fairly relevant but not 
quife so much so, and so on. A dash indicates that the 
document was not considered at all relevant to the partic- 
ular “need” represented by the query. Brackets have 
been added to show those documents calculated as bemg 
at the same distance as a group. In such cases, the order 


TABLE 1. Query 1—Documents on cartography, specifically 
those relating to methods of producing maps automatically 





by digital means. 
First Order Second Order 

Doc. No. Distance Judge Doc. No. Distance Judge 
22088 3335 2 22088 4000 2 
[ 8335 4 1510 5712 0 
15101 8574 0 15392 6660 1 
15392 8750 1 20839 6664 4 
Ë 8750 3 22079 7364 3 
23145 8568 5 
22078 9163 6 
121 9338. . — 
23190 9338 2 
23198 9570 — 
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Tara 2. Query 2—All documents on ablation and 


reentry vehicles. 
First Order Second Order 
Doc. No. Distance Judge Doc.No. Distance Judge 
10005 6666 1 3750 1 
paie 6686 2 Lum 3759 2 
0 5831 0 
— ` 8671 3 
20340 .9163 4 
23169 9163 5 
20341 9163 6 


within the group is simply the order in which the com: 
puter came to them, ie. their original order in the file 

A discussion of Query 7 (see Table 7), which is already 
somewhat familiar from the preceding section, will clarify 
the meaning of these results. When the regular fik 
descriptors were used, 7 documents survived the applica- 
tion of the cut-off value and were ordered as shown 
When the second-order descriptors were used, 5 addi 
tional documents were retrieved and the 12 were orderec 
as Shown. One which was judged not relevant, at the 
bottom, just squeaked by the cut-off value which was 
set to drop anything above 9990.3 Notice how Docu- 
ment No. 33944, used in the illustrative example in the 
last section, was moved from second place on the first- 
order list to an unequivocal, and intuitively correct, posi- 
tion in first place on the second-order list. Note also the 
greater over-all ordering power shown by the second- 
order list compared with that shown by the first-order 
list. 

The procedure had its only clear-cut failure with 
Query 5 where it not only did not uncover any additional 
relevant documents — or improve on the ordering of the 
first-order search—-but dug up seven “trash” items, 
documents that were judged to be totally irrelevant as 


Tapug 3. Query 3—All documents on harmful atmospheres. 


First Order Second Order 
Doc. No. Distance Judge Doc.No. Distance Judge 
28181 7500 0. 23181 4545 0 


23188 6921 0 
23182 — 37145 1 





3 Actually, this document had no descriptors in common with the 
query, so should have been distanced at 1.0000, thus dropped by the 
cut-off value as set. The reason it waan't is ae follows: Because of the 
difficulty of doing division in COMIT (which otherwise is excellent for 
this application) I converted the pesudometric as shown: 


_,_ 140Bl ,|AUB| JANBI_ B 1 
fem [AUB] AUB] -Jaus = (4 UBI ans) (sco; ) 


and a table look-up of the reciprocal of |A U Bl which 

then added to itself a number of times equal to — lA n BD: 
This resulted in accumulating any round-off error in the rec 

in this case, where [A U B| was counted as 27 and th pres of 9” 
was given in the table as .0870, which, added to liselt 27 times, guve 





.8990. 


Taste 4. Query 4—Automatic language abstracting. 





Second Order 


oc.No. Distance Judge Doc. No. Distance Judge 


0 
0 
3 
2 
4 
5 
7 





far as answering the query was concerned. However, a 
cut-off of 9300 would have eliminated all of these false 
ps. 

e procedure had a clear-cut success with Query 3 
where it uncovered one “direct hit" that had not been 
found by the conventional search and another highly 
relevant document and properly ordered them. The ap- 
2 remarkable ordering power shown by Query 7 
was also shown by Query 6. The ordering in Query 8, 
while not as spectacular as the preceding two, neverthe- 
less would seem to be good enough to perform a useful 
function. Comparing the results of the second run (6, 7, 

id 8) with the first (1 through 5), it is easy to imagine 
at beneficial effects of the shghtly longer and slightly 
midi carefully constructed definitors are manifesting 
emselves in terms of increased ordering power. 

In general, the retrieval based on the second-order 
descriptors produced more documents in every case — as 
ould be expected with the cut-off set only to eliminate 
dbomen that had no descriptors in common with the 


ery. The significant question is: Was theré evidence . 


of sufficient improvement in the ordering power to 


Tasis 5. Query 5 — Display systems and aerial 
photographie technique. 


Second Order 
Doc. No. Distance Judge 


First Order 
Doc. No. Distance Judge 





Tasis 6. Query 6—All documents on weapons effectiveness 


in military operations. 
First Order ` Second Order 

Doc. No. Distance Judge Doc.No. Distance Judge ` 
33992 3333 0 33992 2668 0 
33903 6000 4 33932 4704 1 
33932 7500 1 Pese 4704 2 
Ed .7500 2 33901 6812 3 
33901 7500 3 33903 £6400 4 
33943 S000 8 33916 7600 5 
33920 | 8000 7 | sois 7600 | 6 
33916 8000 5 33920 4400 7 
33015 8000 6 33943 8400 8 
33933 8645 0 
33975 8757 10 
33961 9591 | — 
bese 850] — 


promise that when the cut-off value is used as a control 
to limit the number of documents retrieved, the most 
relevant ones would be retained? | 

Insofar as these test results are indicative, the answer 
to this question would seem to be yes. For one thing, of 
an over-all total of 42 documenta retrieved by the second- 
order searches over and above those that had been 
retrieved by the first-order searches, all but 17 were 
judged relevant to the query to some extent. None of 
these 17 “completely irrelevant” documents would have 
been retrieved if the cut-off value had been moved by as 
little as .9990 to 9300. The cost, in terms of relevant 
documents discarded with them, would have been four; 
these had relevancy judgments of 2, 7, 8, and 9 (in 
Queries 1, 4, and 8). For another thing, as remarked be- 
fore, several of the individual query results do show a 
definite increase in ordering power while none shows a 
decrease. 

The fact is, however, that generalization from these 
results is not warranted. Even if we knew what the 


Tasim 7. Query 7—All documents relating to computers 


- and programming language. 
First Order Second Order 

Doc. No. Distance Judge Doc. No. Distance Judge 
33967 3333 1 33944 0667 0 
` 83044 5000 0 33967 3335 I 
33924 6666 3 33941 4002 2 
| 23903 6866 4 33924 6670 3 
33941 7500 2 Pes 6670 4 
ES .7500 5 6838 4 
33922 .7500 4 33925 7384. 5 
33005 7500 6 
) 33008 4890 7 
25] 7880 7 
33943 9250 -8 
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Tase 8. Query 8—Al documents on detection of 


satellites and guided missiles. 
First Order Second Order 
Doc. No. Distance Judge Doc. No. Distance Judge 
33920 3335 2 22920 7395 
| 33918 8335. 1 33917 7770 
83017 S335 0 33918 .8002 


2 Š Š Š 
o2 
= 
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important parameters were in such an interplay of 
documentation, linguistics, statistics, and psychology as 
we have here, we couldn’t say how they interrelate. Con- 
sequently, it cannot be guaranteed that the results just 
discussed could not have been produced by a fortuitous 
combination of atypically advantageous factors. There- 
fore, these results must be considered indicative rather 
than conclusive. This is not to say that they cannot be 
considered generally as quite encouraging. I so consider 
them and hope to soon have an opportunity to plan and 
conduct better designed and more extensive tests. 


zr 
e 8. Possibilities 


In this section 1 point out some of the ways in which 
the definitor-pseudometric combination, by exploiting 
relations inherent in natural language terms, would seem 
to offer inereased efficacy in the retrieval of information 
from existing descriptor-indexed files without requiring 
that they be re-indexed. I believe that, as with the basic 
proposal itself, the validity in principle and the feasibility 
of computer implementation of each of these is fairly 
self-evident. Again, as with the basic proposal, their 
worth under various sets of circumstances remains to be 
explored. In other words, no more is being claimed for 
them than for the basic proposal, although, for simplicity 
of presentation, repetitions of such qualifications and dis- 
claimers will be omitted. 

First, and perhaps most important, is that new termi- 
nology ean be introduced. For example, the term “laser” 
along with a definitor for it can be put in the dictionary 
even though the file may have been generated and 
indexed before lasers were invented. Then if "laser" is 
used in a query it will be interpreted into other concepts 
which may find relevant material in the file. "Laser" can 
also be added to appropriate terms of a file as a definitor 
constituent. For example, such insertion would serve to 
create or reenforce a link between “coherent radiation" 
and “stimulated emission" which may not have been 
considered important when the file was first set up. 
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Biases may be introduced deliberately through the 
mediation of the definitors. À mining company, & watch- 
maker, and a physicist, would presumably think of “ruby” 
in quite different ways and want correspondingly differ- 
ent associations with other terms to be made for retrieval 
purposes. 

If they are known, biases introduced by an indexer 
can be corrected or compensated for. Suppose one file 
was known to index documents on “failure” under “reli- 
ability,’ whereas another did not. To some extent differ- 
ent definitor dictionaries could help correct for this differ- 
ence. In this connection it seems possible that statistical 
word association techniques might be able to assist by 
detecting, and. perhaps even quantifying, such biases. 

Queries and document descriptors can be put through 
different dictionaries, each designed for specific effects. 
This would be particularly useful in the sometimes crucial 


. and often frustrating searches by a person of one disci- 


pline in files created by, or for, persons of other disci- 
plines. This technique could be extended, if found worth- 
while. For example, a query from a member of the 
physics department could be put through a, different 
dictionary than one from a member of the electrical 
engineering department, etc. In principle this could be 
extended to individuals. 

The four just discussed are more a system design con- 
sideration than otherwise. If they were not available, no 
one would wait for them; if they were part of the system, 
there might not be any alternative to using them. The 


next four are more the sort of thing one might think of 


as being used on-line as part of the “dialog” between the 
man and the machine that time-sharing is soon to bring 
into reality. The first of these has already. been men- 
tioned: the on-line adjustment: of cut-off values accord- 
ing to retrieval returns. The querier would enter his 
query using an initial cut-off value determined by experi- 
ence in some way. This initial setting could be different 
for different circumstances if experience showed such to 
be appropriate. It could be determined automatically for 
each query if desired. An attractive and easily imple- 
mented option here could have the computer, after mak- 
ing its search, inform the querier how many documents 
he was about to get before actually giving them (or their 
representations) to him. This would enable him to vary 
the cut-off to reduce (or increase) the number received 
before accepting the output results. 

Various combinations of weightings, threshold values, 
go-no-go requirements, and logic specifications could be 
applied to various specified definitor constituents or com- 
binations of them to control the search process. It would 
be presumptuous, until the applicability of the basié 
proposal has been more thoroughly explored, to discuss 
such possibilities in detail or at length. To clarify the 
point however: assuming, for illustrative purposes only, 
definitors formated like the one given earlier for OAK, 
the querier might want to require that a document to be 
selected at all must have one or more matches with, say, 
AIII in the third definitor-eonstituent position. This 


| 


would insure that each of the documents retrieved would 
have something to do with organie matter (as such is 
considered by Roget at least!). As techniques are de- 
veloped in the interface area between artificial intel- 
ligence and linguistics, such tactics as this might be 

de automatically adaptable to various circumstances 
or parameters 

Selection. subroutines can be installed to cause the 
diétionary to look to the program as if it consisted only 
of some specified subset of the constituents in each 
definitor. Thus, if the querier wanted a connotative, 
browsy search he might want to use only definitor-con- 
stituents which had been derived from Roget. If he 
wanted a more denotative, definitive search he might 
want to use only those he knew to have been taken from 
8 technical encyclopedia. Single or double concordances 
can be combined effectively with such selection sub- 
routines if over all system design considerations so indi- 
cate. Figure 4 shows how this might work. The selection 
subroutine would not merely affect the calculation of the 
pseudometric distance (see arrows from “Dictionary” 
down to “Doc 20” and “Q”), but would in the first place 
have affected the list of definitor constituents used for 
entry to the “First Concordance” (which for each second- 
order descriptor would list the first-order descriptor in 
whose definitor it had appeared). Such concordances, to 
bel practicable, would have to be automatically prepared, 
updated, and purged from file entries and changes 
thereto. 

Last, but by no means least, the definitor provides a 
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way to get classification back into the picture. By dedi- 
cating a number of positions in a formated definitor to 
constituents from various levels or facets of & classifica- 
tion, matches will be generated according to the structure 
of the classification scheme. This introduces a self- 
weighting mechanism in the ease of hierarchical classifica- 
tions because two terms falling into the same class at 
some level would result in matches also being generated 
on each higher level of that classification scheme which 1s 
used in the definitor format. The way this works can 
be seen with reference to the way the definitor-constitu- 
ents from Roget were formated in the illustrative defini- 
tor shown for OAK. A definitor constructed in the same 
format for MAPLE would provide not one but five 
matches with OAK since “maple” is listed by Roget in 
the same most specifie class, 410.51, as is "oak," and so 
would also have the same definitor-constituents (which in 
this case are codes designating Roget classes rather than 
natural language terms) for all four generic levels. 


e Conclusions 


The proposed procedure appears to rest on a sound 
rational basis. Results obtained in two demonstration 
runs are confirmatory and encouraging but are not con- 
clusive. Therefore, adequately designed and carefully 
implemented tests to confirm or deny the general validity 
of the procedure should be conducted. Upon receipt of 
affirmative results, parametric studies of alternatives, 
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tradeoffs, costs, worth, and so forth should be made to 
provide the kind of information that would be needed by 
system designers interested in applying it. 
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want to express my sincere thanks to Mr. Casey. 


This paper was presented in a condensed form on . 


October 14, 1965 at the 31st Congress of the Inter- 


Ż 
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national Federation for Documentation, held. in Wash- 
ington, D. C. I wish to express my appreciation to the 
MITRE Corporation for granting the time and facilities 
needed to prepare that oral presentation and this written 
version, as well as for the sponsorship of the MITRE 
Corporation Educational Assistance Program which 
made it possible for me to take Professor Yngve's course 
in the first place. 
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Specialized Medical Area" 
| 
| 


This is a report of word usage in radiological (x-ray) 
patient records as found in a 5% sample of the annual 
case load at UAMC including 100,000 words. Records 
were taken exactly as dictated. The study is part of 
an effort to develop an IR system for patient data. The 
system “‘autocodes’’ (automatically stores) the physi- 
cian's dictated flndings and diagnoses in such a fash- 
ion that they can be retrieved again automatically. 

' Some of our findings approximate results reported 
in the literature. For example, the rate of introduction 
of new different words levels off to about 2,500 words 
when 40,000 to 50,000 words of text have been ana- 
lyzed. However, unclassified words continue to occur 


| 
| 
| 
| 
| 
| 


. 1 Introduction 

¡This report covers a word analysis of the running text 
of radiologists’ dictation. We have analyzed the first 
100 000 words in 3,321 radiological (x-ray) records com- 
prising 170,000 words. The study is a part of a major 
effort which has been underway for several years at the 
University of Arkansas Medical Center, and is devoted 
to; the development of an information storage and re- 
trieval system designed to handle radiological patient 
data. 

The over-all system is explained in an earlier paper 
which appears in the April 1966 issue of American 
Journal of Roentgenology (1). The system that we are 
developing ig one in which the physician’s dictated find- 
ings and diagnoses are automatically stored in a key 
word type index in such a fashion that it can be retrieved 


| 

* Assistance for this research was provided by the University of 
Arkansas Medical Center Research Computation Laboratory, which is 
supported by NIH Grant FR 00208—02. 

T Respectively, Associate Professor and Head, Biometry Division; Pro- 
fessor and Chairman, Department of Radiology; and Systems Analyst, 
Research Oomputation Laboratory. 


Dictionary Buildup and Stability of Word Frequency in a 


at a significant level of almost 2% at the 100,000 
word level, with a 1% noise level. 

Attempts to establish the rank order of words be- 
yond the first several hundred have failed because 
about 70% of the words appear to occur with such a 
low relative frequency (no more than one time in 
10,000). Thus, establishing files by rank order appears 
impractical, even though filter lists (discard words) by 
rank groups (words with nearly the same relative fre- 
quency) are quite practical. 

Additional data are presented and design implica- 
tions are discussed. 


JOHN M. LONG, HOWARD J. BARNHARD, AND 
GERTRUDE C. LEVY t 


University of Arkansas Medical Center 
Intile Rock, Arkansas 


| again automatically by use of key words, or in a number 


of additional ways. 

The term “key word" as used here stands for a subset 
of the English language consisting of words which are 
considered to be important for some speciabzed area, in 
this case, radiology. These are words that radiologists 
consider important or those that have special technical 
meaning to radiologists. This subset probably will include 
a Maximum of 2,500 to 3,500 words. 

In addition, there will be a nonoverlapping subset of at 
least, 2,500 to 3,500 words which are used frequently in 
the text of the radiologists’ dictation. At the same time, 
they are words which have little or no information useful 
to the radiologist. These are called “discard” words. 

Words that are not selected for the key word or discard 
list are called unclassified words. In our system such 
words are printed out for a specialist to analyze and 
determine which of three fates the word will have; either 
it 1s added to the key word or to the discard list, or it is 
left as “noise” if it is not used frequently enough and 
does not have sufficient information to be considered 


suitable for either list.1 Figure 1 presents a Venn diagram 
of our concept of how the words of the English language 
wil break down in our system. 

The principal reason for the studies reported in this 
paper was to determine the characteristics of our data 
so that we would have the information needed to properly 
design our system. A few of the pertinent questions in 
this regard might be: How much data are needed to start 
ihe system gomg? What are the advantages of a file 
organized by the rank order of the words by frequency 
of use over one organized by the alphabet? How difficult 
is it to determine the rank order of the words? How 
serious is the “noise” problem? How large must the 
discard word lists be to reduee the "noise" level to some 
specified amount such as 1% or .01%? Part of the 
answers to these questions can be found in the literature 
as indicated in the next section. 


e 2. Literature Review 


In 1923, Godfrey Dewey (2) analyzed 100,000 words 
of running texts of general English language material; 
his is the most comprehensive study of this type that we 
know about. He found that “the,” “of,” “and,” “to,” “a,” 
and “jn” are the six most frequently used words in general 
English text (see Table 1). Dewey stated, as a rule of 


*In our system it appears necessary to classify virtually all words 
used in order to keep “noise” to a tolerable level. 
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TABLE 1. Relative frequency of common words 
(expressed as a percentage). 


This Dewey's 
Word study study 

the 9.98 75 
of 4.32 4.0 
and 3.78 33 
is 3.10 33 

to 1.19 2.9 

& 1.25 21 

in 1.79 2.1 


thumb, that the 10 most common words would form over 
25% of the total words, the 100 most common words 
would form over 50%, and the 1,000 most common words 
would form 75% of the total number of words. 

Andrew D. Booth (3) recently compared Dewey's list 
with a similarly constructed list in a specialized aren. 
Using his novel approach, if the frequency of use of a 
particular term in the specialized area differs greatly from 
its use in general text, it is taken to be a “key word" for 
this specialized language. 

Eugene Schwartz (4, 5) and others point out that a 
specialized vocabulary will build rather rapidly to about 
2,500 words when approximately 40,000 words of running 
text have been processed. 
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' 3. Methods 
| 


The raw data for the study are the vocabulary used by 
adiologists in dictating their findings and diagnoses as 
ound in 3,321 radiology case records, The cases were 
elected so as to provide a representative sample of the 
pproximately 45,000 cases handled by the UAMC 
tadiology Department each year. 

The reports have been taken exactly as dictated,? and 
ranscribed onto punched cards or paper tape for analysis 
ising the IBM 1401 Data Processing System. A number 
X computer programs were written to provide the analy- 
es, reported. The authors will provide additional infor- 
nation &bout these programs to those interested. 


| 
| 
i 
! 4. Dictionary Buildup and Other Data 


We found as did Schwartz (5) that, at about 40,000 
vords of text, the first 2,500 different words had been 
ntroduced and the introduction of new different words 
ad reached a relatively low level. This is shown graphi- 
ally i in Fig. 2. The dotted line, using the left hand scale, 
hows the rate at which new words are introduced into 
he system. 

Most of the “key” words in a specialized area will have 
een introduced when approximately 40,000 words of 
unning text have been reviewed, provided the sample 
S representative. In theory, the rate of introduction of 
1em technical terms should approach zero. However, it 

significant at the 100,000 word level which is as 

‘ar às our data currently goes. Probably somewhere after 

ihe retrieval system has been operating for a long time 
| 


* The persons transcribing these dictations make mistakes in typing 
ud in spelling, and these were corrected manually. Incidentally, the 
atter especially is a serious problem and is being studied in depth ns 
other phase of the project. 
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and the number of key words has probably gone up to 
3,500 or more, the rate at which new technical informa- 
tion terms will be introduced will be quite small. 

The problem of unclassified words and noise is quite 
persistent, since unclassified (new) words continued to 
occur at a significant rate for a long time. This js ilus- 
trated in Fig. 2. Refer to the solid lme and the right 
hand seale. Hither a highly trained and expensive man, 
or the computer, must evaluate the unclassified words to 
separate “noise” from “information” (key words) and 
"persistent noise" (discard words). Noise can be reduced 
to as low a Jevel as desired, except that it can never reach 
absolute zero. The tolerable level of noise varies with the 
application. We think our system should have consider- 


' ably less than 1%. In the 40,000 to 100,000 word range 


unclassified words occur at about 1.8%. Thus, for every 
1,000 words of running text introduced, about 18 will be 
unclassified, with about 2/3 or 12 as “noise” and potential 
discard words, and 1/3 or 6 as new technical “key” words. 

Our system uses a series of discard lists roughly 
grouped by rank. We refer to these rank groups as 
discard list filters. The first “filter” eliminates about 30 of 
the highest frequency words and reduces noise by about 
50%. A secondary “filter” of about 2,500 to 3,500 discard 
words next highest in frequency reduces "noise" to 
about 1%. We feel this level is still too high. It is not 
clear whether it would be more efficient to enlarge the 
secondary “filter” list or to create a tertiary “filter” of 
discard words to reduce noise to & tolerable level. 

We found that the first 10 words represented 31.8% 
of the total words used; the first 100 words, 64.9%; the 
first 1,000 words, 94.5%. These results follow Dewey's 
rule as stated earlier, but our percentages are higher. 

Looking at the problem another way, we found that 
“the” consistently comprised approximately 10% of the 
words used (actually 9.98%), "of" comprised 4.32%, 
“and” 3.18%, “lg | à. 10%, eto" ]. 19%, ét a" 1.25%, (€; in" 
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1.79%. This is summarized in Table 1 with a comparison 
to Dewey's results. There is a significant difference in the 
frequency of the use of these words in radiology dictation 
over that of the general English text as presented earlier. 
The results suggest a potential flaw in the logic of Booth's 
(3) proposal. 

These findings have definite implieations for design. 
We considered & retrieval system designed so that no key 
words or discard words were initially determined, that is, 
you simply start inserting running text of the specialized 
area and analyzing the words. Initially all words would be 
unclassified and thus all of them would be printed out. 
A specialist in the area would determine those words 
which are considered key words and those which would 
be considered discard words. The words would be inserted 
back into the system and the system now would begin to 
store information about key words and to ignore discard 
words. 

To attempt to build such a system without a prelimi- 
nary analysis of some text material does not seem to be 
practical because the number of key words and discard 
words that would have to be added in the beginning 
would be at such a rate as to make it rather difficult to 
handle. Also, it would require a great deal of back up- 
dating since when a new “key word” is added it must be 
coded for all previous cases where it was used before it 
was selected as a key word. [See the earlier report (1).] 

It would be more practical to develop a retrieval sys- 


tem by analyzing approximately 40,000 to 50,000 words of 

running texts, using the key words so found as a starting 
list. The other words found with a reasonable frequency 
would furnish a beginning for the discard word list. 
Actually, using this method, we compiled a list which is 
more comprehensive and more suitable for computer 
coding than the previous lists of terms that have been 
published by radiologists, such as the Code for Roentgen 
Diagnosis Indexing (6). 


e 5. Stability of Word Frequency 


The studies of rank stability have implications regard- 
ing the method of searching the lists. The idea of search- 
ing by rank rather than by the alphabet has logical 
appeal. However, many questions must first be answered, 
such as: Can a stable list by rank be achieved, and if so, 
how long will it take? Once done, will the time saved be 
worth the effort? 

Our results tend to answer this last question in the 
negative for it is indeed difficult to get rank stability. 
We found that the ranks of the first several hundred high 
frequency words are approaching stability when about 
80,000 words have been analyzed. The 25 or 30 highest 
frequency words are rather apparent when only 10,000 or 
so words have been processed. Figure 3 shows the rate of 
approach to rank stability for four selected words. The : 
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Oth ranked word is close to its final rank at the 20,000 
ord level, the 75th ranked word takes about 50,000 


fords, and the 100th ranked word uses about 80,000 — 


rords to reach a relatively stable position. - 
Many “key” words are used at such a low frequency 


at attempts to improve list searching times by ranking ` 


e list by frequency must come after a rather long and 
etlious period of data-analysis before their rank can be 
dequately determined. Thus, sorting lists by individual 

would not result in a significant improvement in 
omputer efficiency, except possibly the first few very 
igh frequency words (the first 25 to 250 words). Re- 
atively long lists of words occur with a very low -fre- 
juency of use. For example, our analysis found almost 

4200 words used only one time in 100,000 words, and over 

1,500 words used no more than one time per 10,000. These 

rotds account for about 70% of the 3,427 different words 
ound used. 

though files by rank order appears J—— the 
vithors feel there is great value in searching the discard 
ist by rank groups, as is commonly done and as discussed 
Arller. A rank group, which we like to think of as a 
ilter level, consists of a group of words which seem to 
iave nearly the same relative frequency. 


' 6. Summary 


e first 100,000 words found in 3,321 reports taken as 
lictated from radiological (x-ray) patient case records 
1ave|been analyzed. These words came from an approxi- 
nately 5% representative sample of the annual case load 


n the Radiology Department at the University of 


— as Medical Center. 

The rate of introduction of different. words as opposed 
o total words is plotted (broken line) in Fig. 2. Our 
‘esults seem to conform to other results reported in the 
iterature. The rate of introduction of new different 
vords|levels off at about 2,500 words and. this occurs 


when 40,000 to 50,000 words of text have been analyzed. 
However, unclassified words continue to occur at a 
significant level of almost 2% (Fig. 2, solid line) as far 
out as the 100,000 word level. Noise is still at the 1% 
level which is not tolerable for our application. 

Attempts to establish the rank order of words beyond . 
the first several hundred have not met with great success | 
because about 70% of the words appear to occur with 
such a low relative frequency (no more than one time 
in 10,000). 

Thus, establishing files by rank order appears impracti- 
eal. On the other hand, filter lists (discard words) by 
rank groups (words which appear to have near the same 
frequency) seem quite practical. As currently designed 
we plan to use a primary filter list of the 25 to 35 highest 
frequency words and a secondary level filter of 2,500 to 
3,500 words. We are now uncertain as to whether it 
would be best to expand the secondary filter list or to 
establish a tertiary filter list to reduce the noise level 
further. We plan to continue the study. 
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Relationship of Keywords in Titles to References Cited * 


Some machine-produced indexes have been compared. 
The objectives of an index and the search procedure 
are analyzed. It is proposed that a hypothetical title 
be used for searching, not a question. This permits 
comparison of similar items for common characteristics. 
A relationship between "keywords in title" and "key 
references" of the same article is given. It is shown 


Comparison of various machine-produced indexes is 
given in Fig. 1 in terms of the characteristics of the 
format of the output. The characteristics which are 
considered, and their definitions, are: 


Fixed word length per word: some words are forced 
to fit into specified word length. 

Direct entry; the user enters the main index im- 
mediately. TE 

Direct exit: the user passes immediately from the 
main index to the indexed document (which may be 
an abstraet). 

Complete reference: the entire title and location of 
the document. 

Coden: source journal as specified by a few letters. 

Sereening: minimizing interest in that portion of the 
title in à KWIC index which appears before the 
keyword by overprinting with a sereen. 

Annotation: the addition of words to a title to indi- 
cate concepts covered by the paper but not included 
in the title. 

Stop list: list used by the computer as a definition 
of words that are not to be treated as keywords. 
Titles will not be entered in the keyword index 
under these words. 

Titles per index page: this ratio is estimated by 
dividing the total titles treated by the number of 
pages in the index. The ratio is used only as an 
indicator and must not be thought of as a value 
judgment. (Should the erroneous attitude be taken 


* Hawaii Institute of Geophysics Contribution No. 156. This paper is 
a condensation of a laboratory report (Adams, 1985). 
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that for an efficiently produced article there will be 
more references than keywords in the title. This is 
because, according to the basic theorem of linear 
programming, there will be as many concepts in the 
article as there are completely utilized references. 
The structural model proposed requires additional ex- 
perimental verification. 


WM. MANSFIELD ADAMS 


Hawau Institute of Geophysics 
University of Hawau i 


that a high value of this indicator is desirable, then 
to attain perfection it would only be necessary not 
to issue an index!) No correction for page size has 


been made. 
Titleless: nowhere in the system is the complete title 
given. 


Detailed discussion of each index is given in Adams 
(1965). 


* General Process of Indexing 


The process of searching for information may be 
diagrammatically illustrated as in Fig. 2. In a very 
general sense, the user approaches the searching system 
with certain necessary input information. For the ma- 
jority of the systems considered in Fig. 1, this is in 
the form of a list of concepts related to the user’s cur- 
rent interests. In the special case of the Science Citation 
Index (2), the input information would be a scientific 
paper known to be of interest. 

In some search systems the user goes directly to the 
main “scan” index and then to the document indexed 
in the “scan” subsystem. In other indexes it is neces- 
sary that the user compare his input information with 
a preindex. This serves to translate the input into a 
form useful in the “scan” index, for example, in the 
Universal Decimal System. Alternatively, it is possible 
that the search involves a postindex. That is, the user 
is directed from the “scan” index to a subsystem having 
more information concerning the desired document than 
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Fie. 1. Comparison of some properties of the output of some machine-produced indexes 


provided by the “scan” system, in particular, the 
ocation of the document. A postindex need not be com- 

ulsory; if the source location is denoted by the code 
in the "sean" index, then the user has the option of 
going directly to the document or going to the post- 
index. It is quite likely, however, that the user will 
choose to go to the postindex when the title has been 
severely truncated in the "scan" index. 

Several of the indexes treated here actually index 
abstracts, Conceptually, the collection of abstracts could 
'be considered as a postindex which gives the user ad- 
ditional information concerning the original article. He 





may then proceed to the original document or terminate 
interest, depending on the relevance indicated by the 
abstract. 


e Theory of Information Retrieval 


One recent quantification of the information-retrieval 
process has been given by Goffman (1964). Briefly, 
the answer to a query is obtained by maximizing an 
evaluation function of the system output in terms of 
the probability of relevance of each item of the output 


DOCUMENT 





Fig. 2. Diagrammatic flow chart of the searching process 
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to the query. For this definition of an answer, the neces- 
sary and sufficient conditions for a set to be an answer 


to a question are determined. Although conceptually ' 


helpful, this formulation is not operationally oriented 
because the probability of relevance of every item in the 
set of documents is not operationally defined. One con- 
cept, taken as known, is the critical probability of 
relevance, which acts as a point of truncation for the 
sequence of documents as ordered according to decreasing 
probability of relevance. The function of the informa- 
tion-retrieval system is to locate that subset of documents 
for which the probability of relevance is greater than 
the critical probability. Goffman places great weight on 
the importance of the order in which the documents are 
presented to the user. 


* An Operational Definition of Relevance 


We will now develop an operational definition for the 


term “relevance.” (For an alternative and more pro- 
found effort to operationally define “relevance,” see Hill- 
man, 19642, 1964b.) We do this by analysis of the 
searching process. The steps in searching are usually 
as follows: 


1. State the problem j in terms u concepts represented 
by words. 

2. Classify the words as "aignificint or insignificant. 

-3. For some significant word locate all those titles 
containing that word. 'This forms a subset. | 

4. Using each significant word in the problem state- 
ment, repeat the above steps and form the collection 
of subsets. 

5. There is now available a subset of titles, say M,, 
each one of which contains at least one of the 
significant words in the problem statement. The 
number of all such titles will be an integral 
quantity, say M,. 

6. For some pair of significant words, locate all those 
titles in M, containing both significant words. This 
forms a subset. 

7. Repeat the above step for each pair of significant 

. words.. 

8. We will now have available a collection of subsets 
of titles which have at last two of the significant 
words. We call this the subset M,. (That this 
procedure is really used is verified by a recently 
introduced index.) 

9. In a similar fashion, derive the sets Ma, M,, ete., 
until the subset is the null set or until the number 
of included significant words equals the total sig- 
nificant words in tha problem statement. 

(Note that the subsets M,, Ma, Ma, . . . are nested.) 


We may then represent the number of titles in each 
subset versus the number of included significant words 
as a monotonically decreasing histogram, such as in 
Fig. 3. We may now interpret a KWIC index in terms 
of this bar graph and the definition of relevance. Use 
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Fra. 3. Schematic histogram illustrating the monotonicity | 
of higher-order’ subsets. The order of the subset M is de- . 
fined by the number of keywords which exist in — the 
model title and the — title. 


of the KWIC index is equivalent to starting from the 
left and progressing to the right. Furthermore, it is 
necessary to progress all the way to the right, that is, 
the relevance level of every member of the coded set 
must be determined. A partial ordering by relevance 
is now evident. Any item in a subset to the right of a 
given subset is said to be more relevant than any item 
in the given subset. Consequently, the item having the 
highest relevance is that having the most number of 
significant words from the problem statement. Having : 
defined relevance as an objective partial ordering, there 
no longer seems to be any value of the concept — 
ability of relevance." 

Effects other than relevance need to be included. Oe: 
we wish to emphasize here is the "assimilation time." 
Any user interacting with an information-retrieval system 
will require & definite amount of time to assimilate any 
one of the documents. This is defined to be the as- 
similation time. Becduse a user’s time (and memory) 
are limited, there will be a limit on the number of 
documents he can assimilate; this might be termed the ` 
“assimilation limit.” This jimitietion might truncate 
a search before the “critical relevance” (corresponding 
to Goffman’s “critical probability of relevance”) is 
reached. We introduce a symbology similar to that of 
Goffman. Following this notation, with appropriate modi- 
fications and extensions, 


S= set of source documents 
q= question 
| uc user 
x= element of coded set for S 
r,-— relevance of S(x) to q (the relevance of x to q 
will be used as an estimate of r,) 
R,= critical relevance | 
t,— time required by the user to assimilate S(x) 
T,— assimilation limit of the user. 


Introduetion of the time element, requires elaboration 


f Fig. 3. Assuming the distribution of assimilation 


ime to be approximated by a Poisson distribution, we 
ive such an elaboration in Fig. 4 for a continuous 
nd finite distribution with respect to both relevance 
nd assimilation time. Now each document § is charac- 
by a code, x, a relevance dependent on the 
uestion and estimated by r,, and the assimilation time 
» dependent on the user. The objective will be to maxi- 
ize the number of most relevant documents assimilated 
vithin the assimilation limit, or the critical relevance, 






his means, graphically, starting at the "tail" and 
ilating into the “hill,” and, for documents of equal 
elevance, always using documents of short assimilation 
ime before documents of long assimilation time. Note 
" features. First, the searching procedure might be 


mproved if the user approached the retrieval system: 


i a hypothetical title instead of a question. This 
3 Suggested because the code for each of the documents 
£ searched is a title. The hypothetical title should 
0 so posed that the user would expect a paper having 
title to provide him with the answer to his prob- 
This type of retrieval process shifts additional 
vork to the user, but the increased efficiency in using 
hej index might be adequate justification. . Verification 


-1 






NUMBER OF TITLES 


M| 


Fria! 4. Graphical illustration of general relationship among 
she’ assimilation time, t, the order of the subset, M, and 
the number of titles, The “tail” consists of those titles hav- 
ng the highest value of M; the “head,” of those having the 
owest value of M. : i 





of this hypothesis is provided by semantically analyzing 
the procedure of Lancaster and Mills (1964). Although 


-they repeatedly describe the searching process as using 


“questions” for input, they give as an example (p. 12): 
“We now search these [uniterms] one at a time; eg, . 


.& question on ‘flow solutions for chemically reacting 


gas mixtures’ would be searched for first. . . Note 
that the “question” is presented as a "hypothetical title." 
Nor is this an exception (see p. 6). 

Secondly, it should also be noted that the operational 
definition given here for relevance éan be refined. That 
is, instead of defining relevance by number of included 
significant words in the title, the user forms a hypo- 
thetical abstract. The abstracts are used as the codes 
to the set of.documents. The hypothetical abstract is com- 
pared with-each abstract. As an extremism, this &p- 
proach ean be expanded to include the entire article. 
Such does not appear to be feasible with the current 
computing equipment (nor with the usual reluctance 
of users to precisely formulate the input to a retrieval 
system). ; i 


° Relationship of Title Words and References 


Of the several types of indexes reviewed in Fig. 1, 
the Science Citation Index appears to differ most from 
the others. The differences among the various types of 
indexes are important because index makers sometimes 
make more than one type of index for a given set of 
documents, Because the assimilation limit is essentially 
a time limit, the users must then make a decision 
concerning which index, or how many indexes, will be 
used to search the existing documents. In making this 
decision, it is worthwhile to realize the relationships 
between the different types of indexes. Here we will 
discuss the relationship of the title-oriented index to the 
reference-oriented index. We utilize the theory of linear 
programming. | 

‘We consider the author of an article during the entire 
process of creating the article. Assume that the author 
will operate at optimum efficiency. One of the resources 
of the author is time, T. This time may be expended 
in & variety of activities, each of wkich contributes to 
the resulting article. Let us designate the concepts in- 
cluded in the resulting article as C,, C,,...C,. For 
mathematical convenience, we quantize “concept” and 
consider it to be composed of “conceptlets.” The ac- 
tivities that the author may perform in furtherance of 
the creation of the paper are several: we designate 
these as X,, X, X, ... X4. For example, the first- 
activity might -be reading document “one”; the second 


. activity, reading document "two"; etc. 


Now each activity, performed for a unit time, will 
contribute a various number of conceptlets to each of 
the several concepts. Thus, from performing activity 
one for a unit time, the author will obtain a,, “con- 
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eptlets” related to C,, 8,,, related to C4, etc. Let the 
mount of time spent on X; be the intensity x,. Then, 
1 general, 


m 

C= ayy (i=1, 2, ...n) 
‘here will be a limit as to the time which can be 
pent in any particular activity and the productivity 
emain directly proportional to the time. We limit this 
ime to be less than y,. By assumption, only linear re- 
ponse is considered. | 
Furthermore, all the time spent on all the activities 
iust be less than, or equal to, the time available, T. 
The author wil consider the various concepts to have 
ertain relative values, say u,, u,, etc. (For numerical 
alues, we might just use the number of words he would 
ke to write on that concept.) 
Based on the foregoing ideas and terminology, we 
yay formulate the following linear-programming prob- 
m. Maximize the objective function 


Z=u,C,+u,C,+ --- +u,C, 
Vhere: 


s . 
Q= » 04131 [Im 2) . . . n) 
1)=:1 


ubject to conditions: 


> x, < T 
=l 
Xj yy (J=1, 2, e. m) 


Inder known conditions this problem will have a 
lution. The important aspects here are not neces- 
arily the detailed solution, but the general properties 
f the solution. We consider these now. | 
Applying the basic theorem of linear programming, 
re find that under an optimum arrangement, there will 
e as many concepts tn the created article as there are 
eferences completely utilized (see Baumol, 1963). Thus, 
' the author desires to include n concepts, only n 
ferences will be used to capacity, in general. In 
ther words, in general, m an efficiently produced article, 
here will be more references than concepts. 

We have found a relationship between the concepts 
overed by an article and the references on which the 
rticle is based. Our intention is to derive a relation 
etween title words and the references; therefore we 
eed a relationship between title words and concepts. 
‘his is provided, in an empirical sense, by the Oceantc 
'oordinate Index, under its "Citations" index. In 
ddition to the title, annotation is given. We estimate 
nat there are approximately as many concepts as there 
re keywords in the title. Each keyword represents one 
oncept, by assumption. 

Using the foregoing findings, we can relate the key- 
'ords in the title of an article to the references of 
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that article. We predict that for a given article, ef- 
ficiently produced, there wil be more references than 
keywords. This expectation has been checked using 
estimates from articles in scientific journals. The actual 
ratio is probably about three references to one keyword. 
Inspection reveals that the references are of two types, 
those pertaining to data, and those pertaining to con- 
cept.? We estimate that about half the references 
pertain to data, hence, to a first approximation, we 
estimate that: . 


number of conceptual references 

. number of keywords-in-title — 
Defining a “key reference" to be a “conceptual refer- 
ence” which is fully utilized, we may hypothesize: 


3 
2 


number of key references 
number of keywords-in-title "^ ~ 


Additional investigation is necessary to confirm the fore- 
going estimates and to develop operational definitions 
of the terms. 

Note that the foregoing treatment has implied that 
prior documents (the references) are the only source 
of material for constructing a concept. This limitation 
has facilitated the presentation. Actually, of course, 
"nature" may be treated as a "document" and new 
data (facts) or time-space relationships attributed to 
that reference. "Nature" might appear in the bibliogra- 
phy in the form of a laboratory notebook, field book, ete. 

Note that the situation may be nonlinear. Considera- 
tion of the extreme cases makes this possibility im- 
mediately apparent. Thus, one would intuitively expect 
a book or review article with a very short title, such as 
“Technical Libraries,’ to have more references than 
one with more significant words in the title, such as 
“Indexing of Abstracting Services in Technical Li- 
braries.” At the other extreme, a paper dealing with 
such a restricted subject that the only reference is to 
one of the author’s own previous papers might be 
expected to have an extremely long title. In order to 
properly circumscribe the very detailed topic being 
covered, we might hypothesize that the relationship 
between the number of references and the number of 
keywords is as shown in Fig. 5. Indeed, this relation- 
ship could be used to define the nature of a document. 
For example, a document falling between the origin 
and K, would be a book, between K, and K, would 
be an article, and a document lying beyond K, might 
be defined as an exercise. Items falling in the K,-K, 
range would be article-exercises. This illustrative no- 
tation is not suggested for actual application without 
further study to confirm our hypothesis about the shape 
of the basic relationship between keywords and key 
references. 

Another theorem of linear programming is especially 


* A similar division is made by Lancaster and Mills (6, p. 9) in 
discussing the applications of a law relating “recall” and “relevance.” 
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Fo. 5. Hypothetical relationship of references to keywords 
for books, articles, or exercises 

| 
applicable to solving our earlier dilemma over which 
ofthe eriteria—relevance or assimilation time—would 
be dominant. It is shown in linear programming theory 
that “a program is the most efficient if and only if it 
contains included activities such that no excluded ac- 
tivity contributes more to the objective function than 
an! equivalent combination of included activities” (Dorf- 
man, Samuelson, and Solow, 1958, p. 164). Thus, the 
interrelationship of the assimilation time (the intens- 
ity, xj) and the relevance (the productivity, ay) becomes 
clearer by assuming the existence of relative value 
(uj). Àn expression for the marginal value of the created 
article in terms of the source references may be ob- 
tained from Farka’s theorem (Dorfman, Samuelson, 
and Solow, 1958, p. 191). However, there seems little 
point in giving the details here since the relative values 
aré defined subjectively. 

The expansion of the foregoing interpretation to 
nonlinear and dynamic conditions is relatively straight- 
forward. 

This work reports on the format. As evidenced by 
Index Medicus, the format is highly machine-dependent. 
For information about development of instruments, ma- 
chines, and equipment for data storage and retrieval, 
such as are relevant to indexing, see Elliott (1964). 


nclusions 


operational definition for relevance has been given 
terms of the concepts in the posed question included 
the title of the document. 


in 
in 


I 


The searching process is outlined to be: 


1. State the problem in terms of concepts repre- 

sented by words. 

2. State hypothetical title of a hypothetical article 

which would be expected to answer the posed 

question. 

Classify the words as significant or insignificant. 

4. For some significant word, locate all those titles 
containing that word. This forms a subset. 

5. Using each significant word in the problem state- 
ment, repeat the above steps and form the col- 
lection of subsets. 

6. There is now available a subset of titles, say 
M,, each one of which contains at least one of 
the significant words in the problem statement. 
The number of all such titles will be an integral 
quantity, say M,. 

7. For some pair of significant words, locate all 
those titles in M, contaming both significant 
words. 

8. Repeat the above step for each pair of significant 
words. 

9. There is now available a subset of titles which 
have at least two of the significant words. This 
we call subset M}. 

10. In a similar fashion, derive. the sets M,, M,, ete., 
until the subset is the null set or until the number 
of included significant words equals the total 
significant words in the problem statement. 


Use of a hypothetical title for searching a list of titles 
permits comparison of similar items for common char- 
acteristics. 


Qo 


A proof is cited that relates the references to the 
concepts covered by an article. The relation of key 
references to keywords is then empirically estimated by 
ascertaining the keyword-concept relationship. 

Goffman has noted the possible significance in the 
order in which the documents are presented to the user. 
Probably much more important is the relative order 
of importance of keywords in a title. If authors were 
to order the keywords by order of importance rather 
than by grammatical rules, the hypothetical title could 
be sumilarly ordered. Comparison could then be made 
for the order of the joint keywords. 
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À Mobil Testing of the Natural Language Storage and 
etrieval ABC Method: Preliminary Analysis of Test Results" 


After a brief summary of the test program, the statis- 
tical results tabulated as over-all "ABC-Relevance 
Ratios” and "ABC-Recal| Figures" are presented and 
viewed. An abstract model developed in accordance 


ch observations as the absence of the detrimental 


e Introduction 


The principles and operations of the ABC storage and 
retrieval method were briefly outlined in a presentation 
at the 151st National Meeting of the American Chemical 
Society (1), and the preparations and the procedures of 
the test d to assess the capability of the method 
were the subject of a detailed technical report (2). In 
ds report we discussed the test program and its prin- 
ciples, the selection and organization of the collection, 
the preparation and standardization of the requests, the 


procedures and forms used in the retrieval operations to . 


record results, the methods of evaluating the data, and 
e transfer of the evaluated data to the summary sheets. 
In the present article we will deal primarily with a pre- 


* I am indebted to Dr. Maurice Apstein, Messrs. Theodore B. Godfrey 

d Hoyt W. Sisco in HDL, and to Mr. Martin H. Weik, Jr., in ARO for 
vdluable suggestions, and to Mr. Hoover Ogata for the critical comments 
that prompted the inclusion of additional evidence and a stronger 
formulation of the presented thesis. Discussions with Mr. Y. W. Lan- 
and an exchange of written communications with Afr. Oyril 
Oleverdon have contributed to a better mutual understanding; they 
differences of opinions, in particular those based upon the 
ic variations of the aystems tested and the HDL plen of evaluating 
variables of the first-generation ABO model for the express purpose 
of gaining the information necessary to design a system that can 
y the user. Finally, I acknowledge with gratitude the help and 
advice given to me by Mr. R. A. Fairthorne. 
1The presentation is already dated in several respects. For example, 
pian of introducing subject schemes to organize large clusters of 
ptive phrases had to be abandoned for a more practical and eco- 













2 The Army Research Office, Scientific and Technical Information Divi- 
sipn, Washington, D. C., supports the development of the ABO-System 
a contribution to the Army Technical Library Improvement Studies. 


ih Max Weber's “idealtypus”’ is used in discussing ` 
s 


effects of an inverse relationship of Relevance and 
Recall upon a system's effectiveness. The increase of 
Recall in proportion to the number of documents re- 
trieved is attributed to the ABC-system's peculiar capa- 
bility of making the user an integral part of the system. 


BERTHOLD ALTMANN 


Harry Diamond Laboratories 
Washington, D. C. 


liminary analysis of test resulta. A mathematical model 


. and the final statistical analysis of the accumulated data 


are being processed for publication. However, the in- 
formation presented in this factual report is consistent 
with the conclusions of reference (3). 

Table 1 briefly outlines the organization of the test 
operations. Two types of requests were prepared by 
41 HDL scientists and engineers (Group 1 of Group I): 
(a) 225 requests based on the contents of papers ran- 
domly selected from the entire test collection; and (b) 
36 requests based only upon a general knowledge of the 
subject areas covered by the collection. The control group 
(Group IL) consisting of 31 senior scientists and engi- 
neers (Including members of DOD, Air Force and Navy 
research agencies, and the National Bureau of Standards) 
standardized all requests with respect to form and con- 
tent, reducing the number of document-based requests 
from 225 to 100 in the process. 

For the retrieval, Group 1 was divided into two teams 
(la and 1b). Each team processed 50 requests that its. 
own members had helped to formulate and 50 requests 
formulated by members of the other team. This proce- 
dure was used to determine if any bias was introduced by 
having an operator retrieve information in response to 
requests he himself had formulated. In addition, both 
teams processed the 36 freely styled requests. Also used 


8 Organization and interpretation of the tabulated data were — 
to a continuing review following standard stetistical 
diligently and resourcefully handled by W. H. Menden, who, 1. turn 
could rely on the advice and guidance of B. M. Kurkjian. 
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Taste 1. Test Outline 














Group I Group II * 
Group if Group 21 Group 3§ 
1b 
I. Preparation of Requests: | 
Type a: Document-based 
Type b: Freely styled 
II. Standardisation of Requests: 
Type a reduced to: 100 
Type b: 36 
III. Test Runs: 
Type a: Own requests 50 
Type a: Counterpart’s requests 50 100 100 
Type b: Requests 38 36 36 
Total 136 136 136 
IV. Pre-evaluation of results 136 136 
V. Final evaluation of results 136 


EE A —_ — — — — — — —— — — 
— — — 


81 Senior sclentists and engineers including those of other agencies 


+ 
t 41 HDL scientists and engineerg 

16 Analysts (George Washington University) 
$ 6 HDL Librarians 


in retrieving were Groups 2 and 3 of Group I. Group 2 
included six professors from George Washington Univer- 
sity, who took the place of HDL branch and laboratory 
chiefs, since the latter were not available for the experi- 
ment. Group 3 included six members of the HDL Library 
Staff. 

While the first group (two teams) of 41 bench workers 
represented 77 percent of the population of operators and 
each of its members was assigned about 5 percent of the 
total test requests according to their individual interests, 
the remaining two groups (2 and 3) each represented 
11.5 percent of all operators, and individual members 
were assigned 16 to 17 per cent of all test, requests. This 
distribution of operators and requests was intended to 
simulate & realistic situation, where & bench worker 
studied and probed a relatively narrow subject area, 
while a supervisor or librarian covered a multiplicity of 
subjects. 

The operators used three tools for retrieval: (1) the 
ABC method with a short dictionary (Version I) ; (2) the 
ABC method with a long dictionary (Version II); and 
(3) the KWIC title list prepared for all documents in- 
cluded in the collection. The ABC dictionaries were 
 KWIC-type listings of index phrases describing the con- 
tents of the collection and differed merely in the amount 
of detail displayed. The KWIC tütle list was used for 
comparison as well as a control. 

An operator processed the same request in successive 
runs with three different tools with individual test runs 
normally separated by a lapse of a day. All operator 
groups tested the short ABC dictionary (Version I) first; 
normally, a retrieval with the KWIC title list followed, 
and Version II was tested last. To determine any opera- 
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tor bias, Group 1b tested ABC Versions I and II in order, 
using the KWIC title list last. 

For the determination of r, the number of relevant 
items in the collection for each request, the members of 
Group II followed a fixed procedure designed to guarantee 
a high degree of objectivity. Essentially, this process in- 
volved the following three steps: 


1. Analysis of the wording of the requests to form Hn- 
guistic building blocks (concepts) and = the 
minimum combinations of these that would be ac- 
ceptable. 

2. Determination of the total number of diferent docu- 
ments retrieved for each request that were relevant by 
the above criterion. 

3. Determination, if possible, of additional relevant items 
not retrieved by the test operators. | 


To determine the relevancy of an item retrieved in 
response to a particular request, the evaluators estab- 
lished a ranked list of linguistic building-blocks for the 
request; each block represented a conceptual unit. These 
conceptual units were then compared with the contents 
of the document, and relevancy was assigned if, of several 
predetermined combinations of conceptual units, one was 
found to be descriptive of the document. 

- In the next step, the total number of different relevant 
items retrieved in response to that request was determined 
by comparing the 12 sets of relevant documents retrieved 
for each request (all requests were tested by four opera- 
tors using three different tools, the two ABC Dictionaries 
Versions I and II, and the KWIC list). 

In addition to this, the members of Group II used 
various lengthy and sophisticated retrieval tools and 
procedures not available to the test operators to deter- 


t 


ine if relevant items were in the collection that had not 
been retrieved. In particular, they checked a systematic 
card catalog * consisting of abstract cards with cross 
references and supplemented by an alphabetical index. 
Members of Group II also employed specially constructed 
retrieval loops (2, pp. 11, 16-17) as control mechanisms 
to pheck for further relevant materials. Finally, the quan- 
tity r was obtained as the sum of the different relevant 
i found by the test operators and the members of 
ean Group IT. 
ith a value of r asserted for each request and the 
quantities x (number of relevant items retrieved) and n 
(number of items retrieved) determined for each re- 
trieval run, individual Relevance Ratios (sometimes called 
Precision Ratios) defined as z/n and Recall Ratios de- 
fined as z/r were calculated. For statistical reasons, 


averages of these ratios (for groups of observations) were 

calculated by averaging the numerators and denominators 

separately, i.e., for k observations of z,/n,(t—1..k), we 
u 


obtained: 
k 
>" 
(z/n)a = — 


25 
[zi 


ong the averages obtained were those for all 136 re- 
q processed by a given group of retrieval operators, 
or ifor all four groups of operators using either version of 
the ABC dictionary (see Tables 3 and 4). Statistical tests 
(xi analysis were made to insure the validity of these 
averages; the results are recorded in Table 2. The tests 
produced no evidence that average Relevance Ratios ob- 
s for different sets of data could not be validly com- 
bined. 
Df the average Recall Ratios, however, only those rep- 
resenting the total (Versions I and 11) results for a given 
up of testers and a given set of questions were found 
ited to form meaningful combinations. To stress the 
fact that average Recall Ratios as well as their combina- 
tions are of the limited significance, we will refer to them 
in the subsequent discussions as average Recall Figures. 
urthermore, based upon evidence of the test 
(Table 2), we assumed the existence of an ABC system 
parameter for Relevance. The parameter which is called 
“ANBC-Relevance” in this article will be estimated by the 
grand total average over all Relevance Ratios observed 
for the ABC system. 
or the same reason, we cannot assume the existence of 
a similar systems parameter for Recall, to be estimated by 
averaging Recall Ratios. Although we have consistently 
caiculated averages and have called them Recall Figures, 
wé have done so only to facilitate a discussion of the prob- 
lems involved. 


^ To obtain this catalog, the test collection had to be indexed under 
the classification of “Solid State Abstracts." 

«The problem of arriving at a Recall Ratio that is representative of 
the test or the ABO method is analyzed in reference (3). 


Tasis 2. Results of Statistical Tests Performed to Justify 
the Combination of the Different Sete of Data 


(All decisions were made at a level of significance of 5%) 
I. Is the combination of the four sets of data obtained for 


each retrieval tool from the four groups of retrieval 
operators acceptable? 


Method: X” analysis on a four by two contingency 


table. ` 
L ABC Dictionary 
Version I Version II 
Rele- Rele- 
vance Recall vance Recall 
36 Fully styled 
requests yes no yes no 
100 Source document 
requests yes no yes no 


II. Is the combination of the two sets of data obtained 
. from each group for each set of requests in testing 
ABC Versions I and II acceptable? 


Method: Computation of the standard error of the 
average difference. 


Relevance Recall 

Ratio Ratio 
36 Fully styled requests yes yes 
100 Source document requests yes yes 


III. Is the combination of the two sets of corresponding 
data obtained from the 36 freely styled and the 100 
document-based requests acceptable? 


Method: Computation of the standard error of the 


differences. 
Relevance Recall 
Ratio Ratio 
Short Dict. (ABC Version I) yes no 
Long Diet. (ABC Version II) yes no 
‘Combination of I and H yes no 


The employment of four different operator groups, 
three retrieval tools, two retrieval sequences (I, KWIC, 
II; I, II, KWIC), and essentially three types of requests 
(freely styled, document-based "own" and document- 
based “other”) produced a complex test result with four 
variables specifying each run. If the distinction between 
document-based “own” and “other” is disregarded, since 
they compensate for each other in a comparison of 
Groups la and 1b, there still remain four groups (1a and 


. 1b distinguished by the retrieval sequence), three tools, 


and two types of requests, which result in 24 sets of data. 

Before we enter into the discussion and interpretation 
of the results we must raise the question of the general: 
usefulness of validity of test data obtained for a par- 
ticular system in & particular test environment. Without 
a clear understanding of (1) the systems themselves, 
(2) the testing methods (by necessity adjusted to evalu- 
ate the operational capability of the individual systems), 
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Tante 3. Average Relevance Ratios and Recall Figures for 36 Freely Styled Requests 
ABC Version I (Short) ABC Version II (Long) I and II Combined KWIC Title List 


Relevance Recall Relevance Recall Relevance Recall Relevance Recall 


—.—I EN — —— — — — a — — ——— — — —— n  —— nn — — — — 


Group la l 
52 52 53 54 105 106 02 62 
Group ib 
55 55 69 69 124 124 59 59 
I>T—+KWIC ei 91.7 36i 21.1 $0 86.3 Jm 24.8 148 88.6 535 23.0 s 71.1 278 21.2 
Group 2 
2 71 71 82 : 82 153 153 85 65 
>KWIC=J11 8 85.5 35 25.5 99 91.1 278 29.5 18 88.3 556 21.5 93 69.9 778 23,4 
Group 3 l | I 
; ' 15 74 11 72 146 146 46 46 i 
| 253 63 275 69 . . 528 132 232 58 
2 s — —) (0 — — == 
Group average* = 88.2 (sa) (23.6) Z7 88.7 (2) (5.7 ' xs 894 (ss =) (24.4) m 67.3 zc (21.5) 
Í . 154 : 
Team averaget [84.5] (5 [55.4] [87.8] 159.7] [86.2] (78. 4l 
*Figurea in parenthesis are combinations shown to be not valid. (See section titled "ABC Relevance and Recall Figures.”) 
Team average is tho accumulated result of la, 1b, 2, and 3, with all duplications eliminated and with each discrete document counted in the accumulated z. 
Nore: Although r is theoretically constant (278), it is reduced in several sets of data because some runs were disqualified for technical reasons. 
Tapug 4. Average Relevance Ratios and Recall Figures for Document-based Requests 
ABC Version I (Short) ABC Version II (Long) I and II Combined KWIC Title List 
Relevance Recall Relevance Recall Relevance Recall Relevance Recall 


AA 


7 J mi 9) gs «eg cm U7 Gu das cu 7 ces T uu 795 gw o) 
= anao up 9^ do (oom iid m TM am 9) gg 5 x 1 4 i 
C" pazoan Ewa BS My, MBS m oa, PD XD, mL 383 
CUP cn i 5.3 al [ii ap 95 gw 4 gs 48 qe 104 gs 729 550 tr 
Group Averaget * 85.1 (5) a i on 86.2 (a). de (150. de = — (iss) —J (9m (a i) 





*Figureg in brackets are recall percentages based on the retrieval of the source document only. 
Figures in parenthesis are combinations shown to be not valid. (See seetion titled “ABC Relevance and Recall Figures.”) 
Nore: Although r is theoretically constant (860), it is reduced in several sets of data because some runs were disqualified for technical reasons. 
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(3) the precautions taken to eliminate bias, and (4) the 
processes of evaluating different elementa, in particular 
ep satisfaction of the different user groups, & comparison 
of the systems and their performance will be futile despite 
ch common designations as Relevance and Recall given 
to the measuring rods. 
order to preclude possible misuse and misinterpreta- 
tion of our statistical results, I repeat here briefly the 
description of the ABC system’s characteristic features 
d of the methods employed in the evaluation process 
although the detailed explanations had been given in our 
preceding report (2, pp. 1-2, 22-25). 
he system provides the scientist with a display of 
organized, self-explanatory descriptive phrases as well as 
a comprehensive basis for detecting associations between 
subjects, disciplines, and ideas’ pertaining to his problem. 
iticipating a future information system where a scien- 
tist will directly confer with the text (stored in a com- 
puter) by means of communication links, remote teletype 
consoles, disc memories, and optical displaye: the ABC 
dictionary provides the means for direct communication 


h the contents of a collection. The retrieval opera- ` 


tidns, the selection of keywords in accordance with an 
initial formulation of the request, the matching of words 
phrases, the follow-up of leads to additional clusters 
of [descriptive phrases (as the searcher evolves a better 
strategy) are admittedly complex processes, but they are 
as | rapid as a knowledgeable investigator can make. de- 
cisions while scanning the appropriate pages of the ABC 
dictionary. Whatever system he is using, the responsibility 
for making final decisions is inevitably the investigator’s. 
However, in the ABC system, most of these decisions are 
made during the initial phases of the retrieval process. 
Because the ABC system provides the scientist with 
complete freedom to select, reject, and browse, we would 
haye distorted the operational realism had we imposed 
upon our operators an upper or lower limit for the num- 
bers of documents to be withdrawn from the test collec- 
tp or for the length of time to be spent in a retrieval 
Our primary objective was not a performance assess- 
ment of individuals or groups, but an evaluation of the 
system, its performance and its capability. Since the in- 
ty of an operator’s effort is bound to affect an in- 
dividual result, the relative number of items recalled may 
belan appropriate measure of this human factor; and 
with its effect on test results known, it is — to dis- 
cuss the capability of the system. 


e Test Results 


he average Relevance Ratios and Recall Figures for 

the 24 sets of data and for their combinations were cal- 
culated. They are tabulated in Tables 3 and 4 for the 
fréely styled and document-based requests, respectively. 
e average Relevance Ratios for Version I of the 


ABC dictionary range between 85.5 percent and 91.7 per- - 


it (88.2 percent total average) for the freely styled re- 


quests; and between 83.2 percent and 87.3 percent (85.1 
percent total average) for the document-based requests. 
The corresponding results for Version. Il are 86.3 to 
91.4 percent (88.7 percent average) for the freely styled 
requests; and 83.1 to 90.3 percent (86.2 percent average) 
for the document-based requests. The average of both 
versions obtained with freely styled requests is 88.4 per- 


. cent, &nd with document-based requests, 85.7 percent. 


The total average using all data which we consider an 
estimate for the ABC Relevance, a system parameter, 
amounts to 87.1 percent. 

A brief glance at the average Recall Figures shows 
proportionately wider ranges from 19.3 to 28.6 percent, 
from 19.6 to 29.5 percent, from 13.0 to 23.5 percent, and 
from 17.5 to 22.8 percent. If we were permitted to aver- 
age these, we would obtain 23.6, 25.7, 18.7, and 20.7 per- 
cent, or 22.8 percent for all data. 

A mathematical model (based on the probability dis- 
tribution functions of the variables), with confidence 
limits for the final ABC Relevance Ratio and ABC Recall 
Figures are discussed in detail in reference (3) .® 


i 


e ABC Relevance and Recall Figures 


The ABC Relevance as estimated by the average of 
all Relevance data (87.1 percent) is high and the ABC 
Recall Figures, arrived from all Recall data (22.8 per- 
cent), is low. This might be interpreted to indicate an 
inverse relationship of one to the other. However, the 
analysis of our results shows that such a conclusion 18 
not appropriate. As a first indication, when we (follow- 
ing the method of the first Cranfield Test) calculated the 
average Recall Figure for the returns from the set of 100 
requests on the basis of the retrieval of the source docu- 
ment, we obtained the following scores (Table 4): 46.5, 
41, 50, and 35.4 (43.3 average) percent for Version I; 
48.5, 50.5, 54, and 48 (averaging 50.3) percent for 
Version 11; and an over-all average for the two versions 
of 46.8 percent. Moreover, comparison of average Rele- 
vance Ratios and Recall Figures for Versions 1 and II 
and the KWIC title list simply do not bear this hypothesis 
out. The decrease in average Relevance Ratio with the 
KWIC system versus either ABC version is accompanied 
by no significant increase in average Recall Figure. 


TEAM RESULTS 


Similar trends were found in the analysis of the team 
resulta (Table 3) for the freely styled requests. Because 
of the great expenditure of time and money, the analysis 
of team effort was limited to the freely styled requests, 
and we can propose no reason why the results should be 
different for the set of 100 requests. In calculating the 
team results, all distinct items retrieved by the four opera- 


* Confidence bands for Relevance and Recall were determined at 2 level 
of 95 percent. Recall Ratios, however, could be establixhed only for the 
data of tha constant r subsets. 
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tors concerning & given request were summed to obtain n, 
the number of documents retrieved; all duplications were 
not counted. The number of documents that were found 
relevant were counted. These revised figures were then 
used to calculate team Relevance Ratios and Recall Fig- 
ures which are given in brackets in Table 3. 

The average Recall Figures obtained in this manner 
rose from 23.6 to 55.4 percent for Version I, from 25.7 to 
59.7 percent for Version II, and from 24.7 to 78.4 percent 
for Versions I and TI combined (for a team of four mak- 
ing two measurements each). Average Relevance Ratios 
calculated similarly decreased only to 84.5 percent from 
88.2 percent, to 87.8 percent from 88.7 percent, and to 
86.2 percent from 88.4 percent. The first improvement in 
Recall Figures, either for Versions I or II, requires that 
the individual operators produce quite different but 
equally pertinent responses to the same request using 
the same system. The second improvement in Recall 
Figures (Versions I and II combined) requires that some 
of the operators in a second measurement introduce a 
Significant number of different but equally pertinent 
documents. 

In brief, while recall improved significantly with four 
measurements, and even more with eight measurements, 
the corresponding Relevance Ratios decreased only nomi- 
nally. The inverse relationship is trivial. For practical 
considerations, observed deterioration of the Relevance 
Ratio while improving Recall is entirely acceptable to 
the designer and manager.of any system. Furthermore, 
statistical analysis presented in this paper has determined 
the significance of differences in relevance ratios. 

Although we expected an improvement of the Recall 
Figures for statistical reasons, the size of the increase is 
considerably larger than we expected. 

One may rationalize, therefore, that the system has a 
high recall “potential” which operators in this test usually 
do not realize. One might further argue that the use of 
& team in such & search is not inconceivable and would 
even be practicable. Moreover, several measurements 
must be taken in any system requiring feedback to direct 
the search, and two measurements with this system are 
not excessive. The one point that one might take issue 
with is the validity of the team results in view of the rela- 
tively small size of the collection, and this aspect will re- 
quire further analysis. 

Indications are, thus, and rather fortuitously, that the 
ABC system, having a high Relevance parameter, need 
not necessarily be content with a low Recall Figure. It 
therefore meets requirements stated also by other docu- 
mentalists. Indeed the system can demonstrate high 
Relevance Ratios, while realizing high Recall Figures by 
using (1) teams, (2) additional measurements, and (3) 
perhaps greater effort by an operator, a factor we will 
discuss later. 
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ABSTRACT MODEL 


In order to explain the absence of the detrimental in- 
verse relationsh)p between Relevance Ratio and Recall 
Figure when we compared individual efforts with team 
efforts (having particularly in mind the high rise of 
Recall Figures), we prepared a simple model for one 
imaginary series of retrieval runs. 

Only one major assumption was made in constructing 
the model, and this was derived from our test data. De- 
cause the ABC Relevance Ratio had averaged 87.1 per- 
cent and had shown a great consistency throughout the 
test, we felt justified in asserting a system relevance pa- 
rameter, in this case a conditional probability for a 
retrieved item to be relevant of P=0.80. The number 
of relevant documents in the system responsive to the 
imaginary request was fixed at 8, and the number of 
documents withdrawn in successive or independent re- 
trieval operations was to vary from 1 to 8, 5, 9, 10, 11, 16, 
and 20 as shown on Fig. 1. 

It is evident that z (the number of relevant documents 
found in each run) cannot exceed the number r(=8). 
Relevance and Recall Ratios, therefore, develop as shown 
on Fig. 1. 

According to our stipulation, the system operates with 
a 0.8 Relevance Ratio that will prevail as long as the 
product of np=z does not exceed the value of r(=8). 

Once the product attains r=8=x, then the proba- 
bility figure decreases, since + can no longer increase 
with larger n. Relevance and Recall Ratio therefore 
develop as shown on Fig. 1. The Recall Ratio rises 
steadily from its lowest value to the peak being reached 
when z equals r. In the subsequent runs this ratio re- 
mains unchanged." 

Because this abstract model, based upon a general char- 
acteristic of the test data, exhibits its peak relevance 
potential from the start independent of the efforts (as 
measured by n, the number of items withdrawn during 
the given run) that the retrieval operators exert, the 
following conditions prevail: 


1. The Recall Ratio is proportional to the number of 
documents retrieved (n), and to the number of 
relevant documents located (z), until np=x=r its 
optimum point; and 

2. After the optimum is reached, an increase of n must 
result in a deterioration of the Relevance Ratio 
and can provide no improvement in the Recall 
Ratio. 


From these facts we can draw the additional practical 
conclusions, that 


1. A prerequisite for determining ABC Recall Ratio 
is that in addition to employing teams all indi- 
vidual operators make also a reasonable attempt 
to withdraw the properly tagged documents rele- 
vant to the request. 


T Stephen Pollock (4) has independently reached conclusions virtually 
identical with those represented by the abstract model. 
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Fic. 1. Abstract model for a sequence of retrieval runs with increasing n when r—8 (assumption: p = 0.80) 


2. On the other hand, each system should have 
built-in features that hinder the operator from un- 
economical retrieval efforts (when he reaches the 
t—r level). In the ABC system this feature is 
the capability of matching concepts unambiguously. 


With the abstract model we gain a better understand- 
ing of the interdependence of the factors n, z, and r and 
of the influence exerted in particular on the Recall Ratios 


observed in our test. 
The model presented is no more than a conceptual 


structure, a generalization formed from a few isolated 
observations. It is an ideal type as defined by Max 
Weber 8 and as such may be used as a tool to interpret 


9 Max Weber, Die objektivítit sozlalwissenschaftlicher und sozial- 
politischer erkenntnis; in: Gesammelte aufsitze sur wissenschaftslehre, 
1922, p. 174, 179, 190-191, 194. Max Weber developed his “Idealtypus” 
for the purpose of giving the political and social scientist, the historian 
and the economist a method or tool with which to Identify and char- 
acterixe the concrete, causal relations in social or historic life. In 
applying his ingenious method to the study of etorage and retrieval 
processes we are guided by the similarity of problems, in particular the 
disturbing variety of manifestations the human element introduces as 
author, processor, evaluator, seeker, aud searcher of information. Doubts 
have been cast on the historians' possible success in reducing obeerved 
reality to definition, organisation, evaluation, and comparison; but they 
were dispelled. The “Idealtypus,’’ carefully derived and formed from 
the observed significant facts and used as a yardstick, may lead to the 
same results in the discipline of documentation. However, we must keep 
in mind the baste difference in methodological approach. The historian 
employs the model or “‘typus’’ to characterize and define the singularity 
of a given historic fact (Th, Schieder, Moglichkelten und grensen verg- 
leichender methoden in der geschichtswissenschaft, in Historische Zeit. 
schrift, v. 200 [1065], pp. 520-651, in particular pp. 544-545, 550- 
551), while the documentalist must first determine the varlations (for 
the better or for the worse) brought about by individuals before he can 
identify the performance of the system as such and objectivize its char- 
acteristics for measurement and comparison. 


and compare empirical or experimental data; it is not 
an objective in itself. Its validity as an evaluator in this 
case must be established through application not only to 
some individual performances, but also to the over-all 
test. 

With these requirements in mind, we return to the 
statistical data of our test and retabulate them for a 
suitable and more exhaustive analysis. 


EFFECT OF PARAMETERS 


In a preliminary operation, we organized the 100 docu- 
ment-based requests used in the test according to number 
of relevant documents available in the collection. The 
distribution of requests is shown in Fig. 2. Requests with 
r=6, 8, 10, and 14 were then selected for analysis; and 
their average Relevance Ratios and Recall Figures? 
tabulated by increasing values of n to produce a structure 
parallel to the one of our theoretical model. Averages are 
recorded on Table 5. 

In all the selected groups, the average Recall Figures 
rise with the increase of n. The explanation for this 
relationship is, of course, quite simple. The consistently 
even level of the Relevance Ratio is the outstanding 
characteristic of the ABC retrieval method not only as an 


* The subsequent report (reference 8) proves the statistical validity 
of Recall Ratios for groups with identical rs; however, this distinction 
has not been made in the text to avoid a possible confusion of the reader. 
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Fra, 2. Distribution of the 100 requests according to r 
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TABLE 5. The Influence of n on ABC Recall Figures and ABC Relevance Ratings for Four 


— — — — — — — — — — o. FF — Un. — 


r= 8 r= 6 


r= 14 
Avg. Avg. 
n Recall Relev. 
Figures * Ratios 
0 
1 6.0 (70) t 882 
2 12.5 (14.3) 80.1 
3 19.4 (21.4) 82.6 
4 20 (2886) 83 
5 36 (35.7) 78.4 
6 
7 43 (50.0) 86 
8 53.5 (57.0) 03.6 
9 64 (043) 99.6 
Average 87.2 


Avg. Ave. Avg. 


Recall Reley. Recall Relev. 
Figures Ratios Figures Ratios 


Different Values of r 
r= 10 
Avg. Avg. Avg. 
Recall Relev. 
Figures Ratios 
8 84 13 
18.5 97.1 21 
26.3 92.0 34 
30 94.5 50 
43 90.3 63 1 
55 1 90.2 63 
525 788 88 
63 82.7 
89.5 


100 17 100 
84 33 99 
90.6 39 78 

100 521 78 

100 67 80.4 
84 

100 
94.1 87 





* Averages were computed over the combined results for each m. 


f Optimum values, The actual average value is 1.8 percent under average optimum value. 
l These ratios correspond to the pooled recall ratio (55.6%) obtained by a team of four operators testing 
Verslon I of the ABO method with 86 freely atyled requeste. 
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over-all average, but for every r-group and with only 
minor deviations, every successive n-group. 

The same tabulated test results make it also possible 
to solve & problem we have previously raised without 
providing satisfactory explanations. We refer to the 
Recall Figures obtained by team effort when groups of 
four using identical requests of the freely styled set in- 
creased the score to 55.4 (and 50.7) percent (Table 3, 
col. 4 and 8). In this particular instance, the Recall 
Figure found on the summary table (Table 5), under 
discussion amounted to 63 percent. If we make appro- 
| priate adjustments for the different r-values in the r— 14, 
-r=10, and r=6 groups, we arrive at the corresponding 
Recall Figures of 53.5, 55, and 52 percent respectively. 
Because all the corresponding figures prove to be func- 
tions of n within their particular (r) groups and possess 
nearly identical values throughout the system, it is not 
the participation of a larger number of persons in the 
search (or the larger number of measurements taken) 
that brought about these higher Recall Figures, but the 
greater number of different documents (n) withdrawn 
and used for the calculation. 

The average Relevance Ratios and Recall Figures for 
runs yielding identical n/r are plotted against n/r in 
Fig. 32° As expected, the Recall Figures increase ap- 
proximately linearly with n, but become more erratic 
as n approaches r. The Relevance Ratio shows a slowly 
increasing tendency to decrease. Some of the scatter in 


1 A more complete presentation of the Recall Ratios (so termed be- 
cause of the limitation to distinct r groups) obtained by various runs 
for various ra is shown in Fig. 4. The lines indicate the optimum 
Recall Ratio for the respective rs. 
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the values as n approaches r can be explained by the 
relatively small samples available for averaging in this 
region, as is indicated by the 50 percent point. Even so, 
this figure illustrates the potential of the system as 
realized by various operators. It further illustrates the 
obvious relation that so long as Relevance Ratio can be 
maintained at a relatively constant value, and n is 
smaller than r, the Recall Figure is merely a function of n. 

Furthermore, the consistently high Relevance Ratio 
fully explains why the Recall Figures computed from the 
returns of the r=14 group (Table 5, col. 1) are only 1.8 
percent short of the feasible optimum results; and a brief 
analysis discloses that the apparently low ABC Recall 
Figure for the test of the entire ABC method presents 
actually a maximum value, too. As indicated in Table 6, 
the average retrieval run yielded only 1.8 relevant docu- 
ments out of an average of 8.5 relevant titles in the collec- 
tion. The ABC recall figure could or should therefore 
not have been larger than (1.8)/(8.5) or 21.2 percent. 

The observed test data thus approximate those of the 
abstract model presented on Fig. 1. The first-generation 
ABC system exhibits persistently high Relevance Ratios 
independent of n. The number of documents (for n<r) 
withdrawn, but its Recall Figures : are pre-eminently 
determined by the size of n. 

What remains is a determination of the causes that 
were responsible for the failure to obtain a greater (aver- 
age) number of documents from the collection during the 
test. We base our analysis on the scores made by differ- 
ent operators with identical requests. 

As a representative sample, we have compiled all the 
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Fra. 3. The influence of n on ABC Recall Figures and ABC Relevance Ratios for four different values of r 
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RECALL+ [t (n)] , 
100 REQUESTS 


83223 444 1 





RECALL RAIO (%) 
5653358058583 





Fra. 4. Recall ratios vs. number of retrieved documents for 
requests with identical number of pertinent documents in 
colleetion (lines indicate optimum recall ratio) 
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Tape 6. Test Results of ABC Methods I and II Combined 


Sets of requests z;* Tat nit nad rli 
36 2.34 1.99 2.64 2.25 8.18 
100 1.85 1.70 2.15 1.99 8.60 
136 1.97 1.8 228 2.05 8.50 
Average relevance ratio: 2% — 197 884 percent 
m 228 
Average recall ratio: 2 — a œ 212 percent 
T 


* y, = Average number of relevant documents retrieved (trials without 
results not counted). 

+ z, = Average number of relevant documents retrieved (trials without 
results are counted). 

$n, = Average number of documents retrieved (triala without results 
not counted), 

$ no = Average number of documents retrieved (trials without results 
not counted). 

| r = Average number of relevant documents in the collection. 


observed ABC Recall Figures for 14 randomly selected 
requests: for seven requests each being related to 15 
to’ 24 relevant documents in the collection; and for an 
additional seven requests satisfied by only one to four 
relevant titles (see Table 7). The values are listed by the 
rising number (n) of documents selected and withdrawn 
in the particular run. 

A brief study of the table makes it evident that while 
a few operators utilized the capability of the system 
more extensively (by withdrawing for example up to 10 
documents for requests with a bigger number of corre- 
sponding relevant documents in the collection; and up to 
five documents for the requests addressed to a small 
number of presumably related papers), the majority of 
the participants considered their task completed when 
one to two documents (concerned with the request) had 
been identified. Within this limitation intentionally or 
subconsciously placed on their contributions, the volun- 
teers performed a very good job. In fact their quantita- 
tive output increased in proportion to the decline of r 
(= number of documents relevant to the particular re- 
quest). This can be evidenced by the analysis of the 
results from 60 valid retrieval operations performed only 
with requests for which no more than one single docu- 
ment provided the appropriate or acceptable response 
(see Table 8). In 50 out of these 60 events, the one 
available document was recovered, and a Recall Figure 
of 83 percent was obtained. 


® Discussion 


The analysis of the data prevents us from characteriz- 
ing the ABC system by a low Recall Figure. On the con- 
trary, we have shown that it has a relatively high recall 
capability, and we have in addition shown at least one 
means of achieving this consistently. 

Also, we are in no way disposed to attribute low Recall 
Figures to a deficiency in the operators alone. The 
majority were scientists and engineers who had no previ- 


Tasty 7. Average Recall Figures Calculated from Multiple Responses to 14 (high and low 7) 
Requests and Arranged by Increasing n (in percent) 


Teqsi = Ü x= n= n4 
no. 
135 1 100 100 0 
63 2 50 100 
50 2 0 75 
133 3 33 67 100 
42 3 0 27 
84 4 25 50 75 
45 4 2b 42 
76 15 7 13 20 
74 16 8 9 
78 18 (0 45 13 
57 17 6 12 17 
60 21 5 10 14 
53 21 0 3.8 75 
80 24 4 8 
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ous familiarity with the system. They volunteered their 
services but were not relieved from regular responsibili- 
ties. An early return to the work bench may have been 
on the mind of many. At least, such a desire on their 
part would be fully understandable. Moreover, the test 
proceeded under realistic conditions, where the operators 
were merely instructed in the use of tools; they were at 
no time urged to exhaust all available avenues and to 
pursue all cross references offered by the system, so that 
Recall Figures would be as high as possible. Team re- 
sults and individual performances indicate that such 
could have been done without a significant deterioration 
in Relevance Ratios. 

Other contributing factors to the average low Recall 
Figures as was discussed in the preceding report (2, pp. 
26-29) were certain disadvantages of the first ABC for- 
mat, 


The design of the second-generation model and in par- 
ticular the clearer and more appropriate organization of 
the descriptive sentences in the new ABC dictionary as 
well as the addition of a filter system will facilitate the 
faster acquisition of related subject matter and assure 
the withdrawal of a larger number of documents, which 
should raise the Recall Figures. 

Although we have shown that the ABC system can 
yield high Recall Figures and although we have made 
certain provisions to improve the probability that larger 
figures are obtained, analysis leads us to the conclusion 
that Recall is not an absolute measure of system per- 
formance. With the same set of questions applied to the 
same collection the average r will vary from investigator 
to investigator because of different backgrounds and re- 
quirements, and obviously, the average r will vary from 
one set of requests to another set of requests. 


TABLE 8. Retrievals Resulted from Requests with r= 1 * 
(Among the 100 Document-Based Requests) 











Question Short dict. 
no. la ib 2 a 
n zx n zx "n zx n z 
4 1 1 11 1 1 1 1 
81 Por cpu TA be 
99 3 1. 2 1 1 1 1 1 
121 {ite tit I 1 
125 io us SE d Ad 3 Od 
129 I f qd ot 2 0 = = 
135 21 2 1 1 1 i 1 
136 1 1 1 1 00 00 











pone wich Average scores 
ja ib 2 - 3 ies 
n T n zr n r n x Recall Rele- ` 
vance 
l 1 1 1 1'1 MES: 100 100 
0 0 1 1 l 1 0 0 57 80 
2 1 1 1 1 1 1] 1 100 67 
1 1 3 i 1 1 1 i 88 70 
1 1 1 1 1 1 1 1 100 100 
1 1 L i 2 0 i 1 71 56 
— 1 1 3 0 1 1 86 55 
1 1 1 i 0 0 — = 57 80 
Averages: 82.4% 66% 


In 44 out of 60 valid runs, it was found that r = 1, and n = 1; that is, 
Recall = 100% and Relevance — 100%. 


* There are 8 Requests with r= 1, with 60 Valid Retrieval Runs. 


t - Means no data for evaluation available. 
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With the limited average effort invested in the test, 
it is obvious that the Recall Figures must decrease with 
increasing r. According to the returns of the test the 
Recall Figures dropped from 82.4 percent for the docu- 
ment-based requests with r— 1 (Table 5) to 46.6 percent 
for those with r—2. All these considerations lead to the 
conclusion that Recall Figures of our test depend on a 
great number of variables and subjective factors and ap- 
pear tó be a very debatable measuring rod for ABC Re- 
call capability. 

The frequently advanced theory of an inverse rela- 
tionship between Relevance Ratio and Recall Ratio, 
which can be alleviated only by clumsy and most un- 
economical operations cannot be substantiated by our 
organized data. Although individual operators were able 
' to attain high Recall Figures, their Relevance Ratios in 
| some cases were also higher, and only seldom were they 
| lower. Although higher Recall Figures are obtained in a 
| team approach, Relevance Ratios are not significantly 
, lowered. Reduced Relevance Ratios with the KWIC 
' title list were accompanied by no increase in Recall Fig- 
‘ures (‘Tables 3 and 4) and are furthermore indicative of 
, the quality of titles when compared with analytic 

| descriptions. 


ea cat Siswa 


If the tested ABC system can be described by a model, | 
¡the logical choice is one where we have a high proba-. 


¡bility that a retrieved document is relevant, and the recall 
figures are pn/r. Whether n can be raised by individual 
‘operators at will to pn/r==1 without significant decrease 
in Relevance Ratio cannot be answered without a test of 
the completed second-generation model. Indications from 
this test (team results and individual scores) are that if 
data are re-arranged according to increasing effort mea- 
sured by n, the Recall Figures have been greatly increased 
without & conspicuous deterioration of the Relevance 


Ratios. 
i 
i 
e: Conclusion 


i 
i 


: entiea] requests were used repeatedly, first of all, to 
se the system to & cross section of the user popula- 
tio. As a rule, different operators tested the ABC 
method eight times with each of 136 standardized re- 
q ests. The multiple test produced a greater confidence 
in the average Relevance Ratios because of the consistent 
scores. It brought to light the inherent deficiency of 
Recall as a measure of ABC system performance. It 
permitted, through a comparison of the different indi- 
vidual responses to identical requests, the separation of 
characteristics of the storage and retrieval system from 
thei attitudes or efforts of the operators. This allowed a 
more adequate evaluation of the system as such. 

The multiple testing method made it also possible to 
determine the degree of bias that might be introduced 
by the program and the procedures of the test. For ex- 
| 
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¡According to the established procedures of this test, 


ample, the relative independence of the two runs by, 
identical operators using the same request is corrobo- 


rated by the team Recall Figures. The combined scores 
of the eight retrieval operations (Tables 2 and 3) had 


resulted from contributions of only four individuals who’ 


had tested two versions of the ABC Dictionary. Because - 


‘of their success in raising the Recall Figure from 55 to 


78 percent through the second run, it is evident that the 
same operators located different, but equally relevant 


materials, and therefore remained relatively unaffected ` 


by the findings during their first performance. 


The request of our statisticians for a sizable test cofee- . 


tion has been apparently vindicated by the results. While 


a small collection may be useful in developing a mathe- ` 


matical model prior to the design of a test, or in testing 
a fully automated system, the value of the ABC system 
and its components, including the average contribution 
of the analysts (or subject cataloguers), could not have 
been examined by a representative sample of the user 
population and by & representative set of requests. In 
other words, the test of a small collection would have 
made it difficult to prediet the operational capability or 
utility of the system within & large and rapidly growing 
real collection. 

Test results show that the ABC system has a consis- 
tently high Relevance Ratio. During the test, the average 
Recall Figures were relatively low. However, analysis 
shows that the ABC recall capability is high (in the 
majority of cases observed average Recall Figures were 


less than 10 percent below the optimum obtainable for 


given n and r), and that procedures can provide for 
both high relevance and recall. 


e. Future Work 


Instead of pointing to the relatively good and con- 
sistent Relevance Ratios, we wish to direct our future 
attention to the consistent amount by which the system 
errs (14.9 percent of the documents withdrawn are not 
relevant). What are the contributing factors to this over- 
all deficiency: (a) the inability of the searcher to under- 
stand correctly the descriptive sentences in the ABC dic- 
tionaries, (b) the lack of appropriate words or phrases 
used by the analysts in their cataloging, or (c) still other 
elements which require correction of the system? 


Other subjects which will require discussion and study `` 


are: the time factor, the influence of operator group 
background on the scores, creation of mathematical 
models, the analysis of phrases and sentences for com- 
parisons and standardization; preparation of SOP’s for 
the analysts, the automatic production of thesauri, 
evaluation of the KWIC title approach, the development 
of new testing methods and new yardsticks for the 
evaluation of gystems, and the comparison of the results | 
with the results of different tests. 


* General Conclusions 


Test results published to date suggest that coordinate 
ma systems are faced with the alternatives of either 
“vastly extending the power of the basic coordinate in- 
dexing process or ... replacing this process by an alto- 
gether different one” (5). 

thout exceptions designers, operators, and evaluators 
of epordinate-index type retrieval systems experienced 
and owledged the unavoidable deficiency of the in- 
verse Relevance-Recall relationship. 

Highly sophisticated methods and computer programs 
as well as computers with giant memories are the pre- 
requisites 12 for future information systems that will 
place the scientist in a position where he (starting with 
a given set of terms or a preliminary formulation of his 
problem) can guide a retrieval operation to a successful 
completion in a personal direct dialogue with the 
computer. 

While such solutions cannot be realized before two, 
and probably many more, decades have lapsed, the 
ABC em approximates the performance of such sys- 
tems today; it can easily be adjusted for exhaustive man- 
ual retrieval acceptable to the scientist with respect to 
flexibility, quality, and speed of output; it can also, with- 
out large expenditures in time and dollars, provide: (a) 
fully automated retrieval runs on the basis of Boolean 
combinations and probably the application of a vector 
method, and (b) mechanized production of a dictionary 
or thesaurus. Future research will eventually lead to the 


U The SMART system with its multiple and consistent approaches la an 
outstanding example for a program moving toward the final objective 
(6, 7). ` 





automatic standardization of terminology and syntax as 
well as to mechanized semantic retrieval. 
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Brief Communications 


Bradford's Law and the Keenan- 20,287 
Atherton Data 


Bradford's methods are applied to the Keenan-Ather- 
ton data. The results do not fit Bradford's Law. 


Bradford's Law (1) states: “If scientific journals are 
arranged in order of decreasing productivity of articles on 
a given subject, they may be divided into a nucleus of 
periodicals more particularly devoted to the subject and 
several groups or zones containing the same number of 
articles as the nucleus, when the number of periodicals in 
the plot, and the succeeding zones as segments of the re- 
maining straight portion of the plot with each sone having 
the same number of references as the nucleus. The curve for : 

Bradford's curve is drawn by plotting the running sums of 
references against the logarithms of the running sums of 
titles. The nucleus is defined as the initial curved part of 
the nucleus and the succeeding rones will be as 1:n:n? . . >” 
the sones has to be a straight line if the ratio lin:n?... 
is to hold. 

The Keenan-Atherton data (2) was — in Bradford's 
manner, and is shown as Fig. 1. If we take Bradford’s 
criterion for the nucleus as the curved section of the plot, 
and his condition that the succeeding sones contain the 
same number of references as the nucleus, we get a fair 
approximation by setting the end of the nucleus at 6,581 0.954 1.531 , 2.607 
references, * boundary between the first and second zones LOGARITHMS OF RUNNING SUMS OF TITLES 
wis xu eid i E dd Fro. 1. Bradford curve for the Keenan-Atherton data 

The curve does not, however, support Bradford’s ratio 
1:n:n?.. .; to have done so, it would have to have taken References 
the dotted extension shown on the figure. The reason for 
the deviation lies in the 10 percent higher number of titles 1. — S. Mint cm Public Affairs, Wash- 
having the minimum number of references in the Keenan- Se oeaan B und P Aron Jounal Lenina Ui 
Atherton data. This is shown in Table 1. Physics, American Institute of Physics, New York, 

The Keenan-Atherton study would therefore seem to 1964. 
indicate that Bradford underestimated the percentage of Org V. Groos 
titles having a minimum number of references, and there- AFCRL [GEN Library 
fore drew an invalid ratio. Bedford, Massachusetts 


13,456 


6,581 


RUNNING SUMS OF REFERENCES 


Tara 1. Distribution of titles and references between nucleus and two zones. 


Bradford Bradford Keenan- Bradford Bradford Keenan- 

geophysics lubrication Atherton geophysics lubrication Atherton 

titles titles titles references references references 

Nucleus 9(28) 8 (4.9) I 9(23) 429(32.2) 110(27.8) 6,581(32.4) 
Zone 1 59(18.1) 29 (17.7) 25(02)  . 499(37 5) 133(33.7) 6,575(33.9) 
Zone 2 258(79.1) 127 (77 4) 371(91.0) 404(30.3) 152(38 5) 6,531(33.7) 
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Solution of Boolean Equations 
Through Use of Term Weights 
to the Base Two 


In a recent communication by Brandhorst (1), a method 
of weighting i is presented for the solution of Boolean equa- 
— In the paper, Brandhorst notes that “.., though 

weighting had its advantages, nevertheless there were 
somé equations that could not be reduced in this way.” 

There is an alternative to the type of term-weighting that 
Pe presented. While not satisfying & simple (single) 

= oe the technique does afford & unique solution to 

lean equation when used in conjunction with a table- 
m ip following assessment of the truth-value of each term 
in thé set. 

By assigning a series of weights, 2°, 21, 22, , 2*-1 (for 
n = number of elements in the set), to each element i in the 
set, 2 unique sum is guaran regardless of which subset 
is fo d true. A simple table-lookup following the sum- 
mation with the attained sum as the argument can then 
locate| a branch to an appropriate part of the program. If 
the sums (equivalent to “truth” in the equation) are few, 
comparisons against the obtained sum may be faster than 
table-lookup procedures. 

For example, a set of four elements, A, B, C, D, has a 
maximum of 2*—:16 possible combinations. Assigning 
weights of A — 1i, B—2, C —4, D —8 will yield a set of 
sums king on all values from 0 to 15 with each sum 
uniquely identifying its own subset. 

— method is illustrated using the same equations, in 

— order, as those presented by Brandhorst. The 

f 1, 2, 4, 8 applied to A, B, C, and D are constant 

hog out ‘all equations. The conditions for truth are 

presented as the sum (S) immediately to the right of each 
equation. 


(1) de Sz3,8—5,8-—7,8-9, 
5z11,92218,8 = 15 

(2) ALBIC+D S == 0. 

3) AJB- C-D S = 15. 

(4) AB: C - D) S= 1, 8 = 14, S = 18 

(5) (AF B)--(C - D) SA0, 8944, 895 8 

(600 (A+ B) - (C + D) S213. 

(7) (A + B) - (C + D) 5 S B whore 8 8 and Sx 12. 

(8) (4 + B)--(C - D) §=-3,8=7,8 > 11 









The method for evaluation of the resultant S must, of 
course, depend upon the complexity of the range of truth 
values; equation (1) is obviously best evaluated by a table- 
lookup while equations (2), (3), and (6) can be handled . 


ique is often used in computer programming 
where, do example, any of 2* "different subroutines 
(branch ) must be taken depending on the truth subset 
of n elements. Use of a continuously bifurcating tree-struc- 
-ture approach can require as many as 2” tests in order to 
locate the ¡appropriate branch. The weighting system de- 
scribed here will require n tests, n summations (if all ele- 
ments are true), and a table-lookup. 
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A Decentralized National Chemical 
Information System 


Many witnesses of the chemical information scene appear 
to agree that the inevitable National Chemical Information 
System is doomed to a short life if the hard copy of all 
the literature must be stored in a geographically single, 
monolithic structure. 

The proposal that follows is an alternative. There should 
be a minimum of centralization of the document collections. 
Use the current university collections, supplemented by the 
Center for Research Libraries, the CAS library, and others. 

During the past decade a large amount of discussion and 
a significant amount of work have centered around the con- 
struction of bibliographic and search techniques, both theo- 
retical and practical, both experimental and operating, with 
and without regard to cost. All of the problems have not 
been solved for bandling structures of chemicals, for search- 
ing concepts, for formulating questions, ete. Nonetheless, 
the burden of interest has largely neglected the final step of 
presenting the requestor with hard copy, accurately, 
promptly, and cheaply. 

We suggested a solution obliquely in 1961 (1). The con- 
cept of a National Chemical Information Center was not 
being discussed in 1961. The technical information problem 
had not then been called to the attention of the Premdent 
of the United States. There was no COSATI. The Chemical 
Abstracts Service had only begun its research program. Few 
libraries had computer programs. But the need for the 
presentation of hard copy, the journal article, to the re- 


. questor was at that time, and still is, the bottleneck in the 


transmission of technical information (2). 

All of.the large chemical libraries in universities in the 
country could be linked by wire service to the major search- 
ing services. When a search has produced its output of 
references, the library nearest the requestor will be notified 
of the references. needed immediately, and for a small fee 
will photocopy them, and mail them promptly to the re- 
questor. Alternately, the requestor could order needed 
references directly from an identified, certain source and 
expect to receive them by return mail. 

Every known document which conveys new chemical 
information would be catalogued and available to this wire 
service. If some of the 10,000 journals covered by Chemical 
Abstracts were not subscribed to at all in the United States, 
a small library could be formed at the site of the searching 
service to guarantee that all original articles in chemistry, 
including the “obscure,” were available. 

We recognize that the economics of interlibrary loans and 
photocopies are currently not in favor of the lending or 
copying library. We recognize also that there are ineffi- 
ciencies in university libraries today that could be corrected 
with an initial capital outlay for the long term benefit. A 


‘Federal subsidy to these university libraries would initiate 


the program and would be continued only so long as it 
functioned as an incentive to the libraries’ administrations 
to improve their efficiencies through innovation. 

Those libraries failing to justify continued subsidy would 
be dropped from the National Chemical Information Sys- 
iem, and other university collections would be sought to fill 
the need of supplying copies of any articles that have 


. appeared in the chemical literature. 


An example of how this concept might work is as follows: 
Some Federal agency, such as the National Science Foun- 
dation, might contract with the Chemical Abstracts Service 
to be responsible for searching the world’s chemical liters- 
ture, with the aid of computers if necessary, and for main- 
taining administrative control of the wire service system to 
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an &dequate number of university libraries throughout the 
fifty states to guarantee: 


48 


de 
2. 


That a copy of every article is available, no matter 
how obscure. 

That through the sale of tokens at a fixed price, the 
daa ET costs for the DOE y Mes would be 
borne by all requestors, whether . government, 
foreign, nonprofit, or commercial. 

That prompt service be maintained for requestors. 
That there would be a continuing competition among 
the chemical libraries of the country to qualify as a 
member of the National Chemical Information m 
and for its subaidy. 
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Letters to the Editor 


ir Bir: 


— the Brief Communication by W. T. Brand- 
ation of Boolean Logic Constraints Through 
ge 


of Term Weights" [American Documentation, July 

my the “newly realized relationship” between Boolean 
expr essions-and term weights has been known for quite a 
few|years and has been the subject of intensive work for 
the past five years in the field ot switching theory. 

Ns "weights" and “weight limit” of the Communication 

ond to the “weight” and “threshold” of threshold 
logic evices; the switching theorist calls Boolean functions 
th be thesized by & single threshold device 
“linearly separable functions." The "group weight" proce- 
dure|described in the Communication corresponds to the 
use of multiple threshold devices for the synthesis of func- 
tions|which are not linearly separable. 

Many of the — in the field appear in the Institute 
of Electrical and Electronic Engineers (IEEE) Transactions 
on Electronic Computers, and are concerned with synthesiz- 
ing Boolean expressions in terms of the fewest number of 
threshold devices. The topic is also covered in recent text- 
books on switching theory. 
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O. FrsscHEIN and M. FISCHLER 
Electronic Sciences Laboratory 
Lockheed Missiles & Space Company 
Palo Alto, California 


Dear Sir: 


My reaction to the letter from Messrs. Firschein and 
Fischler was one of personal embarrassment that I should 
have been unaware of the work they described. However, 
upon examining the references cited by them I realized that 
we apparently have in this case an excellent example of a 
failure of information transfer between two areas of 
endeavor. 

The important thin ing is not that the relationship between 
logical expressions an ee expressions was known and 
used in Other fields such as swi theory, but that it was 
not, as far as 1 could ascertain, being applied anywhere in 
document-retrieval efforts. There was no claim that the 
basic concept was totally new, but that it did not appear 
to have been explicitly realized in the information-tech- 
nology field and that this was quite surprising constdering 
oF time that field has been using Boolean Logic 
h strategies. 

When ¡we first hit on the concept described in my Brief 
Communication, which appeared in the July 1966 issue of 
Americ Documentation, I did & literature search in an 
attempt to discover what other workers in the information- 
retrieval | mig ht be using the same technique. I didn’t 
pes merc Nor did I find any papers at all on the subject in 

"Documentation, Special raries, or any other 

journal the documentation field. Textbooks such as 

cker & E es’ Information Storage and Retrieval: Tools, 

Elements; ories likewise were silent on the subject. 1 

sent the m er around to several] workers in the field and 
all indica d the idea was new to them. 

_ Appare ntly few of us do any reading in the field of switch- 

ihe thes or are able to extrapolate from the terms and 
— of that field to our own. Now two gentlemen 





from that field point out that synthesizing Boolean expres- 
sions in terms of the fewest number of threshold devices is 
& topic E s sasa i concern with them and hardly new. 

t I would now like to know is just when did &ware- 
ness of this relationship enter the field of documentation? 
My literature search may not have been thorough enough. 
It was obviously too restricted in scope. Perhaps other 
workers are making use of the concept. (That is one reason 
it was made a Brief Communication instead of an article.) 
I would like to hear from anyone who might have informa- 
tion on this subject. If, however, our use of the concept 
should represent its first explicit appearance in this field, 
then we have an interesting time lag to explain. 


W. T. BRANDHORST 
Documentation Incorporated 
Bethesda, Maryland 


Dear Sir: 


In his Letter to the Editor in the July 1966 American 
Documeniation d 148), Mr. Robert Jordan argues for the 
use of full forenames, rather than merely initials, in citation 
— and other large personal author listings, e.g., Robert 

er Jordan rather than R. T. Jordan. 

e basic justification he gives for full forenames is that 
ibo practice would permit users to discriminate between 
individuals having common last names and first names 
beginning with the same letters, e. go “Richard Jones," 
“Robert Jones," and “Raymond Jones” are not all reduced 
to “R. Jones.” 

This argument assumes, however, that in every appear- 
ance of a particular authors name, one encounters the same 
pattern. the field of the technical report literature such 
an assumption is decidedly not valid. Whether because of 
the “corporate” nature of their production or the welter of 
sign-offs these documents frequently go through, there is in 
actual practice little consistency in the way given personal 
names appear on technical reports. 

What this means is that if forenames were extracted as 
found in actual technical reports, and repeated in large 
indexes, there would be a tendency to separate indexing 
entries for the same author. Let us take, for example, the 
name Robert Thayer Jones. By overstating our case slightly 
we might postulate the following occurrences: 


R. T. Jones R. Thayer Jones 

R. Jones Robert Jones 

Robert T. Jones (Not to mention Thayer Jones, 
Robert Thayer Jones T. Jones, Bob Jones) 


When you have a lot of Jones’s, an “R. Jones” can file a 
great distance from a “Robert Jones.” The use of initials 
alone in the preceding example would reduce six file points 
to two: “R. Jones" and “R. T. Jones.” The use of initials 
is therefore one way of combating the inconsistencies met 
with in the report literature and providing the searcher with 
fewer places to look. This of course, must be weighed 
against Mr. Jordan's objection that it runs together all 
the *R. Jones” without discriminating the Robert’s from 
the Richard's. This is true and constitutes a valid objection. 
In our own system we have found that the improved search- 
ability of the file overrides the above objection and that 
one can usually tell from the subject matter of the report 
or the various sub-entry tors, such as the report 
number of the corporate source, whether the “R. T. Jones” 


index entry being consulted is the "Robert Thayer Jones" 


of interest. = 


W. T. BRANDHORST 
Documentation Incorporated 
Bethesda, Maryland 
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Book Reviews 


1/o7-1R On Retrieval System Theory. 2d — 
e B. C. Viekery. Butterworth, Washington, D. C. 
lol pp. 


On Retrieval System The or 1961, represented the first 
serious attempt at a reasonably comprehensive overview 
of the field of information storage and retrieval. It was 
much needed work, invaluable to students in schools of 
library and information science and also to the many indi- 
viduals entering documentation from other areas of en- 
— In reviewing — book (Journal of Docu- 
mentation, December 1961) R. Meetham stated: “a 
pisos he ... would have — saved six months, spent 
mainly on the retrieval of information about retrieval, if 
the book had been ready earlier. No one else now needs 
to enter the field through quite such a prickly hedge.” 

The four years elapsing between the first and second edi- 
‘tions of On Retrieval System Theory saw the publication 
of other texts on the subject, notably by Becker and Hayes 
(Wiley, 1963), Bourne (Wiley, 1963), Kent (Interscience, 
1962), Sha (London House, 1965), and Williams (Business 
Press, 1965). The second edition of Vickery can now be 
reviewed in relation to other attempted surveys of the field. 

Robert Fairthorne, wri in 1958, stated that “indexing 
is the basic problem, as well as the costliest bottleneck of 
information retrieval.” Seemingly, Pus the Importance 
of the indexing operation (whieh, in e broad sense, en- 
compasses the surrogation of documents — of requests and 
the creation of a search file to allow the matching of docu- 
ment surrogates against request surrogates) has been over- 
looked in many quarters. The most publicized and most 
cited texts on the subject (those of Becker and Hayes and 
of Bourne) are largely concerned with file organization and 
with methods of physically implementing a retrieval system. 
They pay scant attention to the factors that importantly 
affect all retrieval systems, whether precoordinate or post- 
coordinate, manual or mechanized namely: the size of 
the document classes defined by the index language, the 
extent to which document subject matter is recognized in 
indexing and is translated into the of the system, 
and the strategies by which requests are matched against 
` the file of document surrogates. 

Vickery’s text, fortunately, is — with "the general 
principles of design and operation of systems for the selec- 
tion of documents containing information." It deals largely 
with factors basic to all retrieval systems: the surrogation 
of documents and requests; the structure and characteristics 
of index languages and the devices they incorporate to 
ain or restrict class definition; file organization and 

; searching strategies; and performance evaluation. 
The only chapter devoted to mechanization per se dis- 
cusses In a general way the extent to which equipment can 
be applied to the various stages of the total storage and 
retrieval process. 

The second edition follows closely the outline of the first, 
although virtually all chapters have undergone considerable 
revision and updating. In particular, Vickery has taken full 
account of experimental work carried out in the area of 
automatic surrogation of documents. The text and bibliog- 
raphies reflect a considerable amount of study and syn- 
thesis of the literature of documentation. 

From the point of view of the student, using Vickery as a 
basic text, the organization of the work could undoubtedly 
be improved. In particular, Chapter 4, “Descriptor Lan 
guages,” which jumps back and forth between precoordinate 
and posteoordinate systems in its discussion of devices used 
to broaden class definition (and thus improve recall) or 
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restrict class definition (and thus improve precision) might. 
well prove confusing. to the reader not thoroughl liar 
with the evolution of modern retrieval systems. Moreover, 
some of Vickery's statements are not strictly accurate, For 
example, & conventional card catalog is not an — 


.System, even though & single catalog card, containing 


subject tracings, may be regarded as & unit record for a 
document. Rather, a card catalog, arranged by classifica- 
tion scheme or alphabetical subject h , is a term- 
entry system. We make our first approach to such a system 
by going directly to the terms or class labels that best 
represent the subject matter we are seeking. We do not 
search such a file item by item except as an additional 
screening process within the document classes we have 
chosen to consult initially. In other words, we search a 
card catalog by consulting selected term columns in Vick- 
ery's jitem-term matrix, not by scanning. the item rows of 
ihe matrix. 

On Retrieval System Theory is not a book for the 
g teer. Nevertheless, it remains the beet available text 
on the intellectual aspects (of indexing, index language, and 
searching strategies) fundamental to the design and opera- 
tion of & successful retrieval system. 


F. W. LANCASTER 
Injormation Systems Evaluator 
National Iabrary of Medicine 


1/67-2R L’Automatisation des Récherches Documen- 
taires: Un Modèle Général —le Syntol. 1964. R. C. Cros, 
J. C. Gardin, and F. Lévy. Gauthier-Villars, Paris. 200 pp. 


This is & report of several years work on the part of these 
French collaborators to formulate a new development in 
automatie doeumentation. By definition, the system is 
bound neither to a restrained scientific area, nor to a unique 
type of document analysis, nor to a single format. The SyS- 
tem, called “Syntol” or “Syntagmatic organization lan- 
guage.” includes a group of logical and linguistic rules which 
permit the retrieval of information and facilitate its ma- 
nipulation by means of electronic calculators. There is a 
discussion, in English, of Syntol in the Rutgers series on 
Intellectual Systems for the Organization of Information. 

Syntol is therefore an artificial language, and it 18 dis- 
cussed in this sense. Gardin wrote all chapters except 
Chapter 4 and Chapter 5, which were written by Cros and 
Lévy respectively. 

The word “syntagme” is one which is taken to include not 


only “key words" or “descriptors” but also their structural 


relationships. A dual organization is established for these 
descriptors. Generic relationships are specified by a “para- 
digmatic” arrangement, while nongenerie relationships are 
provided by the "syntagmatic" arrangement. 

A piece of information is shown graphically with lines, 
with circles, and frequently with arrows. It is possible to 
compare graphs i in such a way that varying quantities and 
types of information can be retrieved. 

While the system may be complex, making the first read- 
ing dificult for those with a meager background in mathe- 
matics, the book is well written, and the reading, rewarding. 


PAULINE M. VAILLANCOURT 
Memorial: Sloan-Kettering 
Cancer Center 


1/67-3R 
tifique. 1964. J. C. Gardin 
Gauthier-Villars, Paris. 269 pp. 


In 1962 the Centre National de la Recherche Scientifique 

angaise set up an award of 10,000 French franca, intended 

bring forth new ideas in the field of scientific documen- 

ion. The rules governing the “Grand Prix de la docu- 
mentation scientifique” are abstracted and presented in 
thas volume. 

e book consists of three papers whose authors were 
selected by a jury to share the award equally. The papers 
are quite different in approach, as can be expected from 

se authors, well-known to documentalists, who have di- 
versified orientations and primary interests. 


L'Organization de la Documentation Scien- 


Ps C. Gardin presents the most complete report to that. 


on his “Syntol,” or “Syntagmatic organization lan- 


guage,” for which he is best known. He considers all 


ects of a National Center of Scientific Documentation, ' 


, though his chief interest is in Syntol and the manipu- 
lation of information for retrieval, he nevertheless touches 
on, the financial and organizational problems involved in 
the establishment of such a center on a national level. This 
essay follows the publication of “L’Automatisation des 
Récherches Documentaires." 

ù. de Grolier, in collaboration with Calvin Mooers, gives 
& eral review of a new approach to organisation of a 
large-scale center on scientific documentation as compared 
with using the existing diversified approaches throughout 
thé world. The author concedes that his presentation is 
perhaps more schematic than is desirable, but he insists 
that it is best to begin anew rather than to adapt from any 
dogumentation centers now in existence and thus be trapped 
into solving their problems. 

veral sections are proposed for this center: a depository 
for |rarely used material; a section for commercial repro- 
duction of documents for distribution; a center for coordi- 


nating and distributing abstracts and indexes; a center for ` 


automatic treatment of information; & center for informa- 
tion about scientific research “in progress” as well as pub- 
lished research; and a center for the publication of bibliog- 
raphies automaticallv by progressively better computer 
techniques. 

Levéry, an engineer with IBM, France, treats in some 
detail the mechanical aspects of documentation and of the 
formulation of a thesaurus. 


enerally, this book is of interest to those who are in- ' 


volved in planning documentation centers. The article by 
de Grolier especially shows an awareness of the related 
literature, sometimes to prove a point, sometimes to cite 
the jtem to which the author takes exception. 


PavLine M. VAILLANCOURT 


Memorial Sloan-Kettering : 


Cancer Center 


, E. de Grolier and F. Levéry.. 


1/67 4R The Politica of Research. 1966. Richard J. 
Barber. The Public Affairs Press, Washington, D. C. 167 pp. 


This book is an exposé of the politics of research in the 
United States. To quote university-professor Barber: “Al. 
though Research and Development now involves annual 
expenditures of about $21 billion of which more than two- 
thirds comes from federal funds, science retains such an 
aura of mystery that the scientific community has been free 
from the close scrutiny and skeptical appraisal that we 
typically regard as characteristic of the American political 
process.” Barber proceeds to describe the magnitude and 
characteristics of research in this country, primarily govern- 
ment sponsored and subsidized and technology oriented, 
and the problems which are created by 1t. The government 
dominates research, and handing over much of the control . 
of the research and development of projects to private 
business o izations, on which it relies for decisions. 
These organizations, in turn, keep their records confidential 
even, in some cases, from the government itself. Delicate 
questions thus have been placed in the hands of persons who 
are only very indirectly accountable to those who ulti- 
mately “foot the bill.” ` 

Government-sponsored research suffers from inefficiency 
and disorganization, and money is sometimes appropriated 
to two separate organizations RIDE. the same problem. 
No government agency is eharged with overall study &nd 
Panne of this research, and thus it has grown like Topsy. 

ost research is applied and practical dealing with missiles 
and weaponry instead of pure science and civilian-oriented 
ideas. It is dominated by NASA and DOD. Certain other 
countries have made faster progress on civilian problems, 
and those in the social sciences, and have provided greater 
corporate subsidy for research. Furthermore, research funds 
are not divided equitably on a geographic basis; the mid- 
west has been shortchanged, even though it produces most, 
of the Ph.D.’s in the country. Concentration of research 
funds in a few industries and universities makes the already 
large and powreful even more so,.and it has not benefited 
the smaller and weaker. The concluding chapter sum- 
marizes the changes which should be carried out to improve 
the situation and bring it under congressional and citizen 
control. The book is concluded with 20 pages of references, 
though many citations and examples are not footnoted. 

It is hard to know how accurate a picture is painted 
here, since much of this information is confidential and 
hard to find, but most of it is probably accurate. The gen- 
eralizations are often sweeping and sometimes opinionated, 
and the book is such to be criticized by those on the inside, 
though it will probably have little effect on government 
policy. Nevertheless, it ig a useful compilation of facts 
documenting a picture already well-known, and it should 
prove useful in college and publie libraries to those inter- 
ested in the world of research. 


Joun F. Harvey, Dean 
Graduate School of Library Science 
Drexel Institute of Technology 
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| Wiley presents 
NV the first volume of an important new series 
| —Sponsored by the ADI 


ANNUAL REVIEW OF x 
INFORMATION SCIENCE AND TECHNOLOGY 


x Volume I 
Edited by CARLOS A. CUADRA, System Development Corporation 


The first volume in a new series devoted to consolidating the latest devel. 

opments in the growing field of information science and technology includ- 

ing the generation of information, and its transformation, communication, 
storage, retrieval, and use. 

! The Annual Review series, sponsored by the American Documentation 

| © Institute, will provide full evaluation of accomplishments through a com- 
prehensive, constructive review of current topics. A long-range goal of the 
series is to encompass the larger communication processes in which docu- 
mentation plays a leading role. For this reason, the authors will examine 
not only the literature dealing specifically with information science but also 
that concerned with related aspects in psychology, sociology, communica- 
tion, engineering, management, and business. The series will not merely 
reflect or cater to current interests; it will attempt also to broaden and 
deepen them. 

Volume I covers literature published in 1965 and is divided into twelve 
| major areas, each explored by one or more recognized experts on infor- 
| mation systems and services. The first two chapters deal with the purpose 
of information activities; Chapter 3 focuses attention on the study of 
behavior and the experiences of scientists and technologists confronting 
“Information channels.” Chapters 4, 5, and 6 deal with the core of technical 
problems in the field: the analysis of expressions in natural language and 
the manipulation within a computer of symbols representing these expres- 
sions. 

Index system evaluation, hardware and man-machine communication 

developments are covered in the next three chapters. This is followed by 

| comprehensive discussions of applications in chapters 10, 11, and 12. The 

| aim of Chapter 13 is to help provide a basis for an effective national infor- 

: mation system. Volume Il, scheduled to appear in the fall of 1967, will 
. cover 1966 literature. 


. Contents of Volume I: 


Foreword (Helen L. Brownson); Introduction to the ADI Annual Review (Carlos 
| A. Cuadra); Professional Aspects of Information Science and Technology (Rob- 
| ert S. Taylor); Information Needs and Uses in Science and Technology (Herbert 
| Menzel); Content Analysis, Specification and Control (Phyllis Baxendale); File 
: Organization and Search Techniques (Douglas Climenson); Automated Lan- 
| guage Processing (Robert F. Simmons); Evaluation of Indexing Systems (Charles 
P. Bourne); New Hardware Developments (Annual Review Staff); Man-Machine 
Communication (Ruth M. Davis; Information System Applications (Jordan 
| Baruch); Library Automation (Donald V. Black and Earl Farley); Information 
| Centers and Services (G. S. Simpson and Carolyn Flanagan); National Informa- 

tion Issues and Trends (John Sherrod); Index (Pauline Atherton). 


| 1966. . 389 pages. $12.50. 
Order from your bookseller or 
| JOHN WILEY & SONS, Inc. 
605 Third Avenue New York, N. Y. 10016 
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>f you re In this group, 
you'll not want to miss this 


Informative 
Two-Day Course 


An introduction To 


ADP in Library 
and Information 
Systems 


in Washington, D.C. 
May 11 and 12, 1967 


Tuition $100 


For syllabus and 
registration information contact: 


Richard B. Schneider 

Herner and Company 

2431 K Street, N.W., Washington, D.C. 20037 
Telephone (202) 965-3100 


* Federal employees should contact their Career 
Counseling Officer for information on this course 
which also is conducted by Herner and Company 
under the auspices of the U.S. Civil Service 
Commission. 
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. Progress Through Effective Use of Information. 


Can language processing be practical? 
The answer is, “Yes, definitely yes.”* 


AUTOMATED LANGUAGE PROCESSING 
The State of the Art 


Edited by HAROLD BORKO, Associate Head, Language Processing and Retrieval 
Staff, Research and Technology Division, System Development Corporation 


r 
I 
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A, thorough, up-to-date study of research in 
the use of computers to process natural lan- 
guages for information purposes. Stotage 
and retrieval, stylistic analysis, machine 
translation, question answering, and type- 
setting are covered fully, demonstrating the 
advances made in automated techniques 
being applied today in this important new 
area of information science: 


The volume, comprised of eleven chapters 
written by recognized experts in the field, 


- js divided into four main sections. The first 


|. examines the various functions’ basic to ' 
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language processing and relates these activi- 
ties to computer processing systems. The 
second deals with the statistical techniques 


of language analysis as applied to indexing 
and classifying documents and .extracting 
and abstracting their contents. Also dis- 
cussed is the value of statistical techniques 
to analyze an author's style of writing in 
resolving issues of disputed authorship. 


The third section covers techniques and: 
applications of syntactical analysis includ- 
ing the various syntactic theories and com- 
puter techniques for translating one lan- 
guage into another. The fourth, a single 


- Chapter by the editor, describes how com- 


puters were put to use in the typesetting 
and indexing of the book, providing for 
the reader a unique demonstration of the 
practicality of automated language proc- 
essing. 


* From the editor's preface, in which he adds: "Unless we can develop more 
efficient means of communicating—sharing ideas from person to person 
and place to place—human progress will be inhibited." . 


Contents 


Language Data Processing 
Introduction (H. Borko) 


Language and the Computer. 
(L. Schultz) 


Mathematical Models of Language 
(H. P. Edmundson) 


Statistical Analysis 
Indexing and Classification (H. Borko) 
Extracting and Abstracting 
(R. E. Wyllys) 
Stylistic Analysis 
(S. Sedelow and W. Sedelow) ` 


1967. 


Approx. 480 pages. 


Syntactic Ánalysis 
Analyzing English Syntax 
(D. G. Dobrow) | 
Answering Questions (R. F. Simmons) 
Translating Languages 
(E. D. Pendergraft) 
Designing Artificial Languages for Infor- 
mation Storage and Retrieval 
(C. H. Kellogg) 


An Experiment in Language Processing, 
Book Indexing and Typesetting by 
Computer (H. Borko) 
Conclusions 


The State of the Art 
Prob. $12.95. 


Order from your bookseller or 
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JONKER 552 fulty automatic scanner which records the 
march regutts In punched cords. 


Until relatively a few years ago the only effective 
way of automating information retrieval involved the 
use of the general purpose computer. Since then 
the JONKER Corporation has introduced a line of 
special purpose information search equipment, 
based on optical coincidence. E 


A recent survey of information installations using 


concept coordination, made under the auspices of 
the Information Systems Committee of the American 
Institute of Chemical Engineers, revealed that 40% 
OF THESE INSTALLATIONS NOW USE OPTICAL 
COINCIDENCE SYSTEMS, WHILE 319% USE COM- 
PUTERS. FIFTEEN PERCENT USE PUNCHED CARD 
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JONKER 52 wwe! card reader. 


@matres Sy: rá a 
capacity of up to a million teen. 


SYSTEMS. ONLY 12% STILL USE MANUAL SYS- 
TEMS. ` | 


Over 600 JONKER systems are now in operation 
throughout the United States and as far away as 
Japan, South Africa and Australia. JONKER sys- 
tems comprise self-contained installations as well as 
systems linked to punched card installations and 
computers. One system handles as many as 1,000 
entries a day. Another serves a central library and 
nine satellite libraries. ` 


Various publications of indexes baséd on JONKER 
systems, now have well over 500 users in the 
U.S.A. and overseas. 


Contact... JONKER CORPORATION 
- Main Office: Gaithersburg, Maryland (Greater Washington, D. C. area) 


Telephone: 301/948-9440 
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INFORMATION 
SCIENCE 


AMERICAN. DOCUMENTATION 


INSTRUCTIONS TO AUTHORS 


American Documentation is a publication of the Ameri- 
can Documentation Institute. It 1s a scholarly journal in the 


various fields in documentation and serves as a forum for . 


discussion and experimentation. Papers already published or 
in press elsewhere are not acceptable. For each proposed 
eontribution, one original and two copies (in English only) 
should be mailed to Mr. Arthur W. Elias, Editor, Amen- 
can Documentation, Institute for Scientific Information, 
325 Chestnut St, Philadelphia, Pennsylvania 19106. The 
manuscript should be mailed fat in a sultable-sized en- 
velope. Graphic materials should be submitted with suitable 
cardboard backing. 


Types or Manuscripts: Three types of contributions are 
considered for publication: full-length articles, brief com- 
munications of 1,000 words or less, and letters to the editor. 
Letters and brief communications can generally be pub- 
lished sooner than full-length manuscripts. Books, mono- 
graphs, and reports are accepted for critical review. Two 
copies should be addressed to the Review Editor, Dr. 
T. Hines, 54 North Drive, East Brunswick, New Jersey. 


Processina: Acknowledgment wili be made of receipt of 
all manuscripts. American Documentation employs a re- 
viewing procedure in which all mansucripta are sent to two 
referees for comment. When both referees have replied, 
copies of their comments are sent to authors with the 
Editor’s decision as to acceptability. The refereeing pro- 
cedure requires about 30 days. Authors receive galley proofs 
with o five-day allowance for corrections. Standard proof- 
reading marks should be employed. Reprint order forms are 
forwarded with galleys. 

Format: All contributions should be typewritten on white 
bond paper on one side only, leaving about 1.25 inches (or 
3 em) of space around all margins of standard, letter-size 
(85 x 11 inch) paper. Double spacing must be used through- 
out, including the title page, tables, legends, and references. 
The first page of the manuscript should carry both the first 
and fast names of all authors, the institutions or organiza- 
tions with which the authors are affiliated, and notation as 
to which author should receive the galleys for proofreading. 
All succeeding pages should carry the last name of the first 
author in the upper right-hand corner (05 inch from the 
top) and the number of the page. 


Sry: In general, style should follow the forms given in 
the Style Manual for Biological Journals (SMBJ), published 
for the Conference of Biological Editors by the American 
Institute of Biological Sciences (1904). 

Tiris: The title should be as brief, specific, and descrip- 
tive as possible. Vague and Ree DUE titles may delay 
publication. 

Assrract: An informative —— of 200 words or less 
must be included, typed with double spacing on a separate 
sheet. This abstract should present the scope of the work, 
methods, results, and conclusions. 

ACKNOWLEDGMENTS: Financial support may be listed as 
a footnote to the title. Credit for materials and technical 
assistance or advice may be cited in a section headed 
“Acknowledgments,” which should appear at the end of 


the text. General use of footnotes in the text should be 


avoided. 

GRAPHIC MATERIALS: American Documentation requires 
finished artwork. Follow the style in current issues for lay- 
out and type faces in tables and figures. A table or figure 
should be constructed so as to be completely intelhgible 
without further reference to the text. Lengthy tabulations 
of essentially similar data should be avoided. 

Figures should be lettered in black India ink. Charts 


drawn in India ink should be so executed throughout, with 
no typewritten material included. Letters and numbers ap- 


` pearing in figures should be distinct and large enough so 


that no character will be less than 2 mm high after redue- 
tion. A line 0.4 mm wide reproduces satisfactorily when 
reduced by one-half. Graphs, charts, and photographs should 
be given consecutive figure numbers as they will appear in 
the text; however, figure numbers and legends should not 
appear as part of the figure, but should be typed double 
spaced on a separate sheet of paper, Each figure should be 
marked lightly on the back with the figure number, author's 
name, complete address, and shortened title of the paper. 

For figures, the originals with two clearly legible repro- 
ductions (to be sent to referees) should accompany the 
manuscript. In the ease of photographs, three glossy prints 
are required, preferably 8 X 10 inches. 

ORGANIZATION: In general, papers should state the back- 
ground and purpose of the study, followed by details of 
methods, materials, procedures, and equipment. Findings, 
discussion, and conclusions should appear in that order. 
Appendixes may be employed where appropriate for ex- 
tensive lists, statistics, and other supporting data. 


BIBLIOGRAPHY: Accuracy and adequacy of the references 


are the responsibility of the author. Therefore, literature - 
cited should be checked carefully with the original publica-. 


tions. References to personal letters, abstracts of verbal 
reports, and other unedited material may be included. If 
an as-yet-unpublished paper would be helpful in tbe evalua- 
tion of a manuscript, it is advisable to make a copy of it 
available to the Editor. When & manuscript is one of a 
series of papers, the preceding member of the series should 
be included in literature cited. 


CITATION FORMAT: 


Order: Literature cited should be sequentially numbered | 


88 cited. 


Authors: Give all authors with arrangement as follows: 
Elias, A. W., B. H. Weil, and 1, D. Welt 
Tiles: Give full titles of articles in English, indicating 
language of original as: (In Ger.) 
Journals: Journal titles should be given in full. 
MONOGRAPH AND SERIAL DATA: Should be presented in 


order as follows: Volume, issue number, pagination, and. 


year. The issue number should be given in parentheses if 
journal pagination is not continuous from issue to issue. 
Pagination should be inclusive. Year of publication should 
be given in parentheses. An example is given below: 

Bishop, D., A. L. Milner, and F. W. Hoper, Publication 
Patterns of Scientific Serials, American Documentation, 
16 (No. 2): 113-21 (1965). 

American Documentation is published in January, April, 
July, and October. One copy is included in the individual 
membership fee ($20.00 per year), three copies in the con- 
tributing membership fee ($100.00 per year), and up to five 
copies in the sustaining membership fee ($500.00 per year). 
Nonmembers may subscribe at $18.50 per year, postpaid in 


the US. Single copies may be purchased for $4.65 each. 


Communications concerning memberships, subscriptions, re- 
prints, renewals, back issues, advertising, and changes of 
address should be sent to the American Documentation 
Institute, 2000 P Street, NW, Washington, D. C. 20038. 
American Documentation is indexed in Library Iatera- 
iure, Current Contents of Space, Electronic € Physical 
Sciences, Library Science Abstracts, Science Citation Index, 
Chemical Abstracts, and Documentation Abstracts. 
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Editorial 


The ADI Publication Program 


The cover of this issue of American Documentation 18 intended to indicate to a small 
degree, the proliferation of publieations and projects of the American Documentation 
Institute since 1964. As chairman of the Publications Committee during that time, I have 
seen publications increase to include the Annual Review, an abstract journal, Documenta- 
tion Abstracts, annual proceedings, special symposia, the memorial to H. P. Luhn, now in 
press and many others now being prepared or designed. 

All of this activity can be ascribed to the foresight and support of four separate 
administrations of the Institute. It owes much to the encouragement of the Executive 
Director, Mr. James Bryan, and to the dedicated members of the Publications Committee, 
John Markus, Joe Kuney, Charles Bourne and Mary Stevens. Finally, and ultimately 
it is based on the contributions of the membership who write the articles and support 


. the publications. 


In 1967, the Council awarded the Editor of American Documentation the first 
honorarium ever afforded the occupant of this. office. In recognition of the special 
responsibilities that this implies and desiring to increase the scope and size of American 
Documentation to meet the needs of the membership and of the documentation community, 
your Editor has declined reappointment to the Publications Committee. Another, still to 
be named, will take on the guidance of this growing area. I know that the loyalty and 
cooperation which I have received over the years will be afforded to the new incumbent 
and that the Publications Committee will continue the work which has had such a 
promising beginning. 


A. W. Eras 
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A Film System lor the Duplication of Termatrex Cards 


The Termatrex information retrieval system marketed 
by the Jonkers Business Machines, inc., uses large 
plastic cards containing drilled holes. Duplicating 
the plastic cards to make identical copies for distribu- 
tion ¡requires multiple drilling and is costly, time-con- 
suming, and error-prone. The Eastman Kodak Com- 
pany has developed a method of photographing the 


| 


! 


| 
| 


The Termatrex Information Retrieval System (1, 2) 


marketed by Jonker Business Machines, inc., Gaithers- ` 


burg, Md., utilizes plastic cards containing drilled holes. 


* This article is presented for information only, and nothing herein 
is E ' be construed as a recommendation or inducement to infringe 
patent 8 or copyrights of others. 


x 
x 
a 





plastic cards on cut sheet film approximately one-half 
the dimensions of the Termatrex cards. The film 
sheets can be manipulated in the same manner as the 
original plastic cards. The problems of supplying 
duplicate decks is easily handled by photographic 
copies of one original Termatrex deck. 


C. W. BAKER, C. R. HAEFELE, and W. A. RECKHOW 


Research Laboratories | 
Eastman Kodak Company 
Rochester, New York 


Each card represents a descriptor, and each hole in a 
given card represents a reference. Superimposition of 
the cards makes it possible to note which holes are in 
common. The locations of the holes yield reference 
numbers for the documents that contain descriptors in 
common. 


== + 


Fra. 1. Termatrex card camera, "m and card m — on the easel (right) 
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In 
tories, the top of each card is printed in one of 10 colors 
and ¡has numbered positions 00-99 from left to right 
(Fig, 2). These numbers are located visually by tabs 
extending from the top of the card. The individual card 


is located by the color and number of the tab. One card 


can record reference numbers of 10,000 documents, and 


one deck of cards- (one color) can accommodate 100 


descriptors. The 10 color decks can, therefore, accommo- ` 


date 1,000 descriptors. 
Although the Termatrex System is versatile and eco- 


nomical, it has several disadvantages when duplicate. 


decks are required. Unless all duplicates are drilled at 
one (time, file maintenance is time-consuming, costly, and 
prone to error. A serious disadvantage occurs whenever 
decks must be updated. Hither drilling equipment must 


_ be provided at each Termatrex file location, or the ' 


Termatrex cards must be shipped to a single location for 
drilling. One solution would be to maintain two decks, 
and] to use one while the other is being updated. This 
douples the cost and the possibility of drilling errors. 


system was designed by the Eastman Kodak Com- `: 


pany to produce black-and-white film duplicates (film- 
cards) of Termatrex card decks. “Filmeards can be 
superimposed and used in the same manner as Termatrex 
cards. The size of the filmeard is approximately. one-half 
the|dimensions of the Termatrex card. Drilled Terma- 


trex cards have holes (transparent areas) and non-hole' 


8 (opaque areas). Filmcards (film duplicates of 
Termatrex cards) also have holes which are trans- 
aie and non-hole areas which are opaque. Equipment 

built to photograph, store, and manipulate film- 
cards (Figs. 1, 2, 3, 4, and 5). 

"he ease of handling the finished filmcards was evalu- 
ated. No registration problems were encountered, and 
the readability of the hole coincidences was almost &s 
good as that of the original Termatrex cards. Use of the 

eards involved two problems: (1) selection of the 
desired descriptor card from the entire deck, and (2) 
reading of the locants or reference numbers. "The color 
and number were printed in large letters at the top of 
each Termatrex card (Fig. 2). The identification was 
easily read on the black-and-white filmcard. All film- 


cards for one color were grouped together in a single .- 
module. The number order within à module may be. 
ale 20. random. Along the edge of each filmcard-are’ . 


¡led 10 holes for the tens position and 10 holes for the 

its position (Fig. 3). 
notched to correspond with holes drilled in the Terma- 
trex originals. To select “Red 53," for example, the 
mgdule containing all Red-color-coded filmeards is used. 
Two pins are inserted in the module: at.5 in the tens 
position, and at 3 in the units position. The module is 
then inverted, and only the “Red 53" filmeard will fall 
out (Fig. 4). The filmeards and the module have been 
corner-cut to ensure proper orientation. 


the system we are using in the Research Labora- 


These holes were: selectively 





| Fig. 3. Filmeard storage module 


The reading of the locants for data retrieval required 
some means of accurately superimposing filmcards and 
recording the coordinates, of the coincident holes, An 


` inexpensive Plexiglas plastic viewing plaque (Fig. 5), 


which could be placed on an illuminator, was constructed. 
The filmeards selected are placed in the plaque, and 


" over them is placed a translucent matte grid overlay. 


The coincident holes are marked with, a soft pencil on 


- , the.overlay. When the overlay is removed, the pencil 


marks on the grid enable the coordinates to be easily 
read. The marks may be erased and the grid reused. The 
use of replicate decks makes economical distribution of 


"copies: feasible. One central deck of Termatrex cards 


is kept updated, and filmcard copies are circulated to 
other file locations. The previous filmcard decks are 


then discarded.” The-use of filmcard decks eliminates the 


possibility oÍ errors. caused by multiple drilling. 
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B ok-Indexes as Building Blocks for a Cumulative Index 


The| advantages of a cumulative subject-index, built 
by merging the subject-indexes of books, are illus- 


trated; and the conditions under.which such cumulation 
would be feasible are discussed. A. mathematical 


model for estimating the overlap between book in-. 
dexes, applied fo a sample of books belonging to the - 


same subject area, show that the overlap is surprisingly 


* Introduction 


it feasible to cumulate the wealth of. index-terms 
present in existing book-indexes to compile a comprehen- 
sive| “catalog” for the world’s book literature? This 
question may be of practical interest, and at the same 
time it may offer a good research vehicle for some insight 


into indexing theory. From both points of view, it is 


p ent to investigate conditions under which it is 
profitable to merge a given collection of book-indexes to 
produce a cumulative index. Our first step is to identify 
these conditions and to observe the extent to which they 
prevail in current book-indexing practice. ` 

of the elusive process that leads from a fragment of 
prose to-a list of wordg or phrases defining its semantic 
content, & book-index is one of the best known and most 


r ble products. The history of such alphabetical ' 


indexes can be traced back to the sixteenth century (1). 

of terms known as "tables," "calendars," "syllabi," 
andi “registers” were used in previous centuries to indi- 
cate the contents of books. Readers and authors of 
scholarly books seem to be in, implicit agreement about 
what a book-index accomplishes, or should accomplish. 
Even if Lord Campbell's' (1859) intention of bringing a 


“bill into Parliament to deprive an author who publishes | 







ok without an index of the privilege of copynght 
and! moreover to subject him for his offense to a pecu- 
penalty” (1, p. 28) was never carried into action 


andl a good number of indexless books are still published,. 
the value of indexes is universally recognized. Yet it is 


"s 


small. Some statistical properties and grammatical 


_.features of book-indexes are described in an attempt 


to determine how much they depart from the character- 
istics of book-indexes ideally suited for cumulation. 
Possible reasons accounting for the great variability of 
index-entries are discussed. 


- 


. MANFRED KOCHEN and RENATA TAGLIACOZZO 


Mental Health Research Institute 
Universtiy of Michigan Ü 
Ann Arbor, Michigan | p 


impossible to find in the literature any satisfactory 
description of standards for differentiating good indexes 
from bad indexes, or for that matter, any convincing 
explanation of the indexing process. 

A few treatises and handbooks on index-making have 
been published at various times (2) to illustrate current 
procedures for compiling indexes, While focusing their 
attention on technical details, such as alphabetization 
of index entries, cross-references, hierarchy of headings, 
they omit problems of a more fundamental nature, e.g., 
concerning the structure of an index. In vain does one 
try to obtain from these publications an answer to 
questions of the following kinds: (a) How does one 


Select the terms suitable to index a particular fragment 


of text? or (b) What constitutes an index entry? 
Current research on indexing (3, 4) aims, on the whole, 
at systematic procedures to produce an index from a 
corpus of text. Investigators engaged in this kind of 
research start from the assumption that the products of 
traditional indexing practices are generally inconsistent 
and have high cost/effectiveness ratios. Their investiga- 
tions aim to introduce revolutionary-changes in indexing 
practice and are more relevant to texts being generated | 
from today on than to the already-indexed past litera- 
ture. In‘ fact, none of the proposed indexing schemes 
have as. yet been -proved to yield a sufficiently low cost/ 
effectiveness ratio to warrant converting the texts of past 
literature into machine-readable form for reindexing by 
computer. Nor has any-proposed scheme been sufficiently 


- forward looking and favorably evaluated to persuade the 


2 é 
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intellectual community to abandon traditional indexing 
practice in its favor (5,6). 

"The possibility of a simple system to bridge the transi- 
tion between currently used indexing practices and 
future, radically improved practices may thus deserve 
serious consideration. Poor as they may be, existing 
indexes do contain a wealth of information that could be 
used to create a. vastly more effective index at a reason- 
able cost. Moreover, the problem solved and the pro- 
cedures used in this operation could provide the base on 
which to build realistic and radically — indexing 


Bren 


. * Why a Cumulated Index? : 


A cumulated index built by: merging the. subject- 
indexes of a large number of books would be equivalent, 
in many respects, to a “subject catalog." This, however, 
is not to be visualized as the room full of card-filled 
drawers now found in a library. Nor need it^necessarily 


be in book form, like the National Union Catalog. This 


is too large and too costly to produce, update, and dis- 
tribute as widely as desirable. Instead, consideration 
should be given to the effective use of time-sharing 
technology, both in the-cumulation as well as in the 


use of the “end-product.” Indeed, the “end-product” 


would be a continually changing cumulation of book- 
indexes stored in large-capacity, digital-computer stores. 
Users would interrogate this catalog from remote access 
stations, which would be connected to such a digital 
store for on-demand interrogation. For example, 4 user 
seeking an operational definition of the unit of time 
would have to know that “frequency standard” or 
“standard second” are the closest “jargon” terms to use; 

. he is not likely to find much of relevance if he performs 
the search by using subject-headings like “Time,” 
"Clocks," etc. If he does somehow hit on ——— 
standard,” he may be referred, among other references, to 
the Ámeriean Institute of Physies Handbook, sec. 5, p. 
112, where he would immediately learn that the standard 
second is 1/86,400 of a mean solar day; that a quarta- 


crystal oscillator at the Bureau of Standards is used to. 


monitor a 5-cycle radio pulse at a frequency of 1 ke at 
the beginning of the last second of each minute, to an 
accuracy of 1 microsecond. If this is what he was really 
after, rather than, say, an explanation of the Heisenberg 


. Uncertainty Principle, or the smallest time interval that 


the latest nuclear counters can record, he would be led 
to this as rapidly as he could use thé console of the 
time-sharing computer, ie. within minutes. Of course, 
if the user of the cumulated index is to.obtain. an 
immediate and relevant response to his query, then at 


' least one of the books that contain the statement he’ 


needs should have an appropriate index entry. (Ideally 
each of the books should have an appropriate index 
entry.) 
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What are the advantages of using & cumulation of book 
indexes over current practices? In current practice, to 
locate a relevant passage on, say, "frequency standards,” 
& user would first have to select a subject-heading under 
which a book containing this passage is likely to be 
cataloged. There is quite a variety of subject headings 
the user may think of and an even greater variety he 
would not think of. Supposing that the appropriate 
subject headings were found, the user might then have 
to scan the titles or.even the tables of contents of so 
many irrelevant books that he might well-doubt if the 
value of meeting his need is worth all this effort. This 
sense of doubt is, of course, very much & question of 
attitude. Convenience of using the index by suitable 
display techniques, timing of responses to queries, etc. 
ean affect this attitude as much as the quality of index- 
ing and may often spell the difference between the user's 
bothering to go after a piece of information or judging 
its value to be less than the effort to hunt for it. 

If the advantages of a cumulated book-index were 
limited to convenience of use and speed of retrieval, it 
is doubtful that from a practical point of view attempts 
to construct such an index could be justified. The 
superiority of a-cumulated index, however, is based 
primarily on the fact that, through it, a large number of 
points of access to the content of books would become 
available. In existing-book catalogs the subject headings 


assigned to each book are very few (in the Library of 


Congress Catalog, for instance, there are about 1.6 per 


` book). In the Cumulated Index there could be as many 


ag there are index entries in the book-index, ie. a few 
hundred. This means that when seeking information on 


.& specific issue, one Would not be forced to search under 


a generic subject-heading, which may or may not give 
access to the desired information (and which would 
certainly produce a large number of irrelevant answers), 
but would have direct access to the information via the 
specific index-entry. 


* Some Quantitative Aspects of Cumulation 


Consider a collection of £ books. A fraction p of these 
contain passages relevant to an index term z, eg., r= 
“frequency standard.” Pick one of the pt relevant books 
at random and let r denote the conditional probability 
that it contains the term z in its index given that it is 
relevant. This relevant book may fail to contain z be- 
cause the author did not index the passage, or because he 
assigned a term other than z to it, etc. If the querist 
is satisfied with any one of the pt relevant books, he 
wil fail to be satisfied if none of the relevant books 
have z in their index. If the events of several relevant 
books each failing to contain z in their index can be 


" assumed to be statistically independent, then the prob- 


ability that each of the pt relevant-books does not con- 
tain z in its index is (1—r)»*. The conditional prob- 


ability of retrieving a book from the merged indexes of 
all £ do given that the book is relevant, is 1— = =r)”, 
This is also the hit-rate (7) or recall-ratio (5). 

Now pick one of the £— pt irrelevant books at random. 
Let Y be the probability that it does not contain z in 
its index. It could, erroneously, contain z in the index 
if the author, for example, indexed a passage under z 
ever though the passage merely mentions x or something 
about z, yet giving no substantive information. Or it 
could contain z as an appropriate index term; but if z 

something different to the indexer and to the 


quetisti, the book may be irrelevant. If the assumption - 


— such errors in several books occur independently 
lid, then the probability that all the t—rt irrelevant, 
books fail to contain z is "+20; and the probability 
that at least one of them contains z is 1—r tmt. With 
theihelp of Bayes’ rule we can now express the conditional 


probability of a book being relevant given that it is: 
among those that contain z in its index in the merged . 


relevance-ratio (5)) as 


P(relevant/retrieved) = 
P(retr./relev.) NR ) 


re of t books (also known as Bice ptanoerate (7) or 


[1 — (1 — r)?t]p 
"Hep r)”]p + 11 — 7*7! (1 — p) 
1 


TE fum Prem per 





hos m increases to p as t increases; that i is, if 
the indexes to many books are merged into a combined 
inler, then the chances of finding a relevant passage by 
looking under z are not much better than when selecting 
one of the £ books at random. It can, however, be Kept 
quite large when £ is not too big. 

Compare these results with the corresponding quanti- 
ties under the present system. Given the index term z, 
it is necessary to pick first a suitable subject heading y. 
Then as many books cataloged under y must be retrieved 

apd scanned for the presence of x in the book-index as 


necessary to obtain & match. Consider, as before, a col- 


léction of t books, pt of which contain a needed passage. 
et r and r/ be defined as before. Now let R be the 
robability that a book will be retrieved given that it 
ontains z in its index, and R’ the probability that a 
ook will not be retrieved given that it does not contain 
7 in its index. These two quantities depend upon the 
“thesaurus” in which the.relation between an index-term 


ike z-—"frequency standard” and various subject-head- . 


ings is specified. They also depend upon the cataloger’s 
udgment in assigning subject-headings to books. 
It can be shown that the probability that a relevant 
ok is retrieved is now Rr+(1—R’)(1—r). This does 


ot depend on £, as might be expected. To derive this, ' 
it was assumed that the conditional probability of a 


book being relevant, given that it is retrieved through 
the catalog and that x is in its index, is equal to the 


^ 


P(retr. /relev.) P(relev.) + P(retr./irrelev.) — = 


conditional probability of a book being relevant given 


that z is-in its index. In a similar manner, it can be 
shown that the probability of an irrelevant book being 
retrieved is now E (1— 7^) + (1— R^)r'. l l 

To take a sample calculation, let ¿=100 books on 
nearly the same subject. We assume that by chance 
alone 1 out of such a hundred contains the needed pas- 
sage, so that p=.01. Suppose further that R=R'=.9 
and r=r"=.95. Then the hit-rate and acceptance-rate 
for the present system are 0.86 and 0.14, respectively. 
For a merged index they would be 0.95 and 0.60, 
respectively. 

To estimate the size of the — index, which in 
turn determines the time needed to look up an index 
term, let n, be the average number of index entries in a 


"book. Let n, be the average number of index entries 


common.to any two books. Generally, let n, be the 
average number of index entries, each of which appears 
in each .of a random sample of k books. The average 
number of new terms added by adjoining a second index 
to the starting index will be n,—n,. The total number 
of different entries in the merger of the two will be n, + 
(n,—n,). The number of new terms contributed by 
adjoining a third index wil be n, —2n,--n,; by a fourth, 
n,—3n,--8n,—n,. Generally, the number of new entries 
contributed by the (k+1)% index is 


(o)=- (1) m+ (E) Ma—...ck ( £)n« 
E asa for b= 0,52, . . yd 


3:0 


or 


The total number N, of different index entries in the 
cumulated index resulting from id £ book-indexes 


n= >> = Ds Joa) 


If it could be assumed that n,—48! where A is a 
constant greater than 1 and s a constant such that 
O<s<1, then it is easily shown with the help of the 
binomial theorem and the expression for the sum of a 
geometric series that 
A[1 — (1—s)‘] 

8 
As t increases, N, converges exponentially from below 


Nix 


` to A as a horizontal asymptote. Thus, A is the ultimate 
.size the index could attain under this assumption. A 


would be smaller if all books were on the same topic 
than if they were not. The smaller s, the more rapidly 
N, converges to A; the larger s, the more slowly. Thus 


1/8 indicates the degree to which different authors use 


the same index terms. If, even in a narrow field, authors 
varied greatly in the index terms they assigned—even 
to the same passage—then s would be close to 1; if 
they adhered to a standard set of index terms, s might 
be close to 0. The latter condition would more likely 
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hold in physical sciences where the jargon is standard- 
ized; the former in historical studies, where there is less 
— about how to label & finding. 

- The assumption, n,;=As/, is not tenable however. To 
promo how ng the number of index entries which 
appears in each of a k randomly selected books, decreases 
with k, we selected .10 books in the field of “learning.” 
Of the 45 possible pairs, we selected 20 at random and 
counted the number of index terms-in-common for both 
members of each pair (Table 1). To compensate for the 
variance in the size of the various indexes, we divided 
the number of terms-in-common by the product of the 
number of index terms. in each of the two indexes. This 


has been shown to be an unbiased, minimum-variance. 


estimate. The median value of these. 20 numbers was 
0.543 x 10+. The median rather than the mean had to 


-be used because the distribution is quite skew. 


Of the ( 19 ) =120 possible book triples, 30 were selected 
at random. ‘One triple had-10 index entries in common, 


` another triple had 7, another 5, two triples had 4, five 


triples bad 3, four had 2, ten triples had 1, and six 
triples had no index entries in common. Dividing the 
number of entries-in-common by the product of the sizes 


of the three indexes, the median of these 30 numbers Was | 


0.039 x 10"*. 


- Repeating this procedure for quadruples gave a me- | 


dian of 0.00525 x 10-5. In the case of quintuples, only 1 
of 30 had one index entry common to all 5. 

The average number of index entries/book was about 
400. We therefore estimated n, by (400)?x 0.543 x 104 


=8.68, n, by (400)? x .038x 10-*=2.43, and n, by (400)* 
x 0.00525 x 10-5— 134. On a log-log plot, log n, vs. log; 


Taste 1. Similarities among subject-indexes of 10 books 
on “learning” 


Sample from all — combinations 
of the 1 





0 book-indexes 
Number of 
identical Pairs Triplets XN Quintuplets - 
Index-entries (N == 20) (N = 80) (N — 80) 
30 1 
19 1 
17 1 
15 1 
12 1 
11 3 
10 2 1 
9 1 
8 1 | 
7 3 1 
6 
5 1 
4 3 2 
- 9 1 5 1 
2 1 ' 4 1 
1 ! 10 ' 7 1 
0 6 21 29 
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- is very nearly linear for these points, and a good fit is 


— by 
io MEO . , 5; 05 —400 
Without Pup 
ta l f b 
N, = Y 400— 138.) 2*5 + “ED (ggg) ges 
kzo 


—.. . 38.6(k + 1)*5) 


it is easy to see that the index resulting from a cumula- 
tion of t books. will have fewer than ¢ times the average 
number of entries in one book index. This number will, 
however, grow significantly with £ and will not reach an 
asymptote as £ gets large, as was-the case for the as- 
sumption x= As. 

The following observations are noteworthy. 


. l. The extent to which several indexes contain pre- 
: cisely the same index entries is (surprisingly) small. 


2. The rate of decrease in the number of index terms- 
in-common to k indexes with & is less than exponential; 
it is such that for large k the number is still signifi- 
cantly high. 

Condition 1 does not exclude the feasibility of cumu- 
lating book-indexes. It tells us merely that the cumu- 
lated index will be quite large and will continue to grow 
as we add indexes. If we wish to maintain a limiting 
size, we will have to purge it oe of less significant 
entries, 

To make a large cumulated index usable, the de 
termg will have to be riehly interconnected, cross-refer- 
enced through relations of synonymy, part-whole, generic- 
specific, ete. Moreover, our estimates were based on re- 
quiring that a pair of index terms be matched in every 
(linguistic) detail to be called common to two indexes. 
The effect of less rigid criteria for the similarity or 
relatedness of two index terms on the production and use 
of the cumulated index is being investigated separately. 


° Conditions Favorable to iS of DODI 
Indexes 


What are the properties of a book-ińdex which would 
make it suitable for merging with other book-indexes to 
produce a comprehensive search tool for the total book 
literature? We have already pointed out that, in order 
to give a relevant response to a query, the cumulative 
index should have an appropriate index entry to the 
book which contains the relevant information. Thus, our 
first requirement for a suitable book index 18: | 


Rule: 1. Each passage in the book likely to answer 
an anticipated query should be referenced in the index. 


Conformity to this ‘tule is as difficult to test as it is 
important. The book indexer, frequently the author of 
the ‘book, hastily and casually assigning “Index terms” 
while proofreading the final galley, usually cannot antici- 
pate all the queries his book may answer. Perhaps he 


can anticipate a few hundred.of the queries he intends - 


to er. This may account for half of all the queries 
that would be asked if really good access to all the poten- 
tial ers were available. Perhaps the other half of 
all stich queries could not have been anticipated by any- 
one at the time the book was written. Perhaps people 


other than the author can more easily think of possible , 


references to the most important passages in the book, 


wir Nonetheless, most existing books provide some 


and could serve as a useful starting point. 
A gecond important prerequisite for cumulating book- 
indexes is the following: 


ule 2. Each index entry should pe in a standard 
linguistic form. 


The first problem encountered when one compares two - 


indexes, in fact, consists of determining whether two 


lin ically different index terms are the same. For . 


example, “frequency standards” and “standards of fre- 
quenby" are probably equivalent. Linguistic preferences 
of type vary considerably in book indexes. Yet 
there may be certain statistical regularities in the use of 
vari jug grammatical forms. The double noun form as 
in “frequency standards” may, for example, occur much 
more frequently than the equivalent noun-preposition- 
noun! form. This might suggest a canonical form into 


which to transform all index-entries. for purposes of 


comparison. š 


' The third desirable property of book-indexes can be 
defined as follows: 


ule 8, If passages from different books treat simi- 
lar topics with the same level of specificity, index- 


having the same level of specificity should be. 


ed to those passages. 


Deviations from this rule are probably the most 
troublesome and the least amenable to standardization 


(8). When selecting an index entry appropriate to a pas- 


Sage, different indexers exhibit great freedom in the 
choice of specificity level. Some of them may use very 
generic, all inclusive, one-noun categories, while others 
may ¡prefer to index a similar passage with a very specific 
entry, in which the same noun appears amidst a large 
number of modifiers. Under these conditions, estimating 
the umber of index-entries common to two indexes be- 
"very difficult. Does the difference in specificity 
refledt a difference in text specificity, or a difference in 
indexers’ preference? To what degree does the size of the 
indexes determine the level of specificity of the index- 
entries? It is clear that if two books of approximately the 
same size have indexes of different length, the larger 
inde wil have a tendency to. include more specific 
entries than will the shorter index. In which way, then, 
will level of specificity, size of index, and size of book be 
related? It is not difficult to see that from the interplay 
of the various factors outlined here a large variability of 
index entries, even within the same subject field, is to be 


exp | ted. 






H 


e Some Properties of Book:Indexes B 


Of a sample of 66 books gelected at random from a 
research library collection in the. biological and social 
sciences, 10 books (15%) had no subject index at all. 
The size of the subject index was distributed-over the | 
remaining 56 books as shown in Table 2. _ 

From this table we can see that about 55% (or over. 
one-half) of the books have a subject index of less than 
400 entries. An index of this size, which takes at most 
five pages of two-column print, can be considered a 


` small index. The medium-size indexes (those including 


from 400 to 800 entries) contribute 27% (or a little over 
one-fourth) to the total, and finally only 7% of the 
sample consists of larger indexes (between 800 and 1,200 
entries). The remaining 11% of the indexes spread widely 
between 1,200 and 7,000 entries. 

The ratio between number of entries in the index and 
number of pages in the book (density of index entries) 
varies, in our sample, between 0.15 and 4.71. The distri- 
bution is shown in Table 3. 

We see, then, that over 90% of the books in the sam- 
ple have an index density lower than 3.00. The average 
density is between 1.00-1.50 index entries for each page. 
of book. Considering the variety of factors that influence 
the ratio between index size and book size (e.g. the re- 
strictions imposed by publishing policies, the differences 
in level of redundancy of the text and in degree of 


` specificity of the index lexicon), the variance of the index ` 


density does not seem extremely high. 

If one disregards the “function” words (for the most 
part prepositions), the majority of index entries are 
composed of either one, two, or three words. We found . 


Taste 2. Distribution of index size in a sample of 56 
š scientific books from various fields - 








Index size 
(Number of | 
entries* per Number of Percent 
index) I books of sample 
00- 200  - 14 55 
200— 400 17 
400- 600 10 >” ` 
600- 800 5 
800-1000 2 7 
1000-1200 2 
1200-1400 1 
1400-1600 0 
1600-1800 1 
1800-2000 1 
2000-2200 0 
2200-2400 1 
2400-2600 0 
2600-2800 1 
. 6800-7000 1 





* We define an “entry” to be 2 word or a group ol words followed. by 
one or more numbers specifying the book location (page) containing the 


| passage referred to. 
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Tasa 3. Distribution of index entries density in & pls f 


of 56 books 





Density : 


(entries/page) Number of books- 





0- 50 
50-1.00 
1.00-1.50 
1.50-2,00 
2.00-2.50 
2.503,00 
3.00-3.50 

- 3,504.00 
4.00-4.50. 
4.50—5.00 


be j 


= RA — RS OD IP CO DD Cb OR 





that in & sample of 10 indexes of small and medium 


size, the total one-, two-, and three-term entries account, 


on the average, for over 90% of the subject index. The 
distribution of the percentages is given m Table 4. ` 

In the top five books of the table, which have less than 
240 entries, the percentages of the one-term entries are 
greater than the corresponding percentages for the five 
bottom books having more than 240 terms per index. 
This makes sense if one thinks that when available space 
. is limited, priority is given to one-term entries, which 


usually are-on a higher level of generalization than are 


the two- and three-term entries. (Compare, for instance, 


the entries “index,” “book index,” and “alphabetical book -. 


-index”.) Only larger indexes, in which space is no prob- 


- lem, can afford to include a large percentage of three- 
.. + term index entries. d 
^. The inverse relation between size and percentage of. 


the one-term entries, which tends to keep the absolute 
number of one-term entries within narrow numerical lim- 


"ite, seems interesting. Possibly, the pool of technical . 


terms from which single-term index entries are drawn is, 
for each scientific area, rather limited in size. This would 
explain why the total number of .one-term index entries 
is not müch higher in the larger indexes than in the 
smaller ones. The multiple-term index entries then 


— 


"TABLE 4. Percent duder entries of different length in & 
sample of 10 books 


index-  One-term  Two-term Three-term 


entries - entries entries entries Residue 
No. % 96 96 96 

Book À 150 ` 54.6 38.7 6.7 

Book B 150 32.7 40.6 187 6.0 
Book- C 151 48.4 278 13.9 9.9 
Book D 28b 42.1 44.5 11.9 1.2 
Book E 240 68.3 27.5 42 

Book F 369 15.7 47.7 29.3 73 
Book G 401 34.9 53.6 10.0 1.5 
Book H 522 33.7 442 19.2 2.9 
Book I 646 136 8 415 283 16.6 
Book J 671 17.0 37.8 243 20.3 
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would be formed by combining these characteristic- tech- 
nical terms with those derived from the much larger 
pool of terms belonging to the everyday (ie., nonscien- 


tific, nontechnical) language. 


We were interested in finding out which naia 
forms are prevalent in index entries. On this subject, 
manuals for index compilation are explicit, even if brief: 
nouns or noun phrases should be used as index entries. 
We found that most of the indexers abide by these rules. | 
'This is not enough, however, to keep the.various mdexes 
within the limits of a standard grammatical format, as. 
anybody can realize at first glance by comparimg'a 
small number of indexes. | 

In our sample of 10 book-indexes, we identified 25 
types of grammatical combinations which account, on 
the average, for about 9895 of the one-, two-, and three: 
term entries, and for about 92% of the total number of 
entries. 

. These 25 — were differentiated on a purely 
morphological basis, without giving consideration to the 
fact that some of them are semantically equivalent and 
are used interchangeably in current language (e.g., noun- 


. noun vs. noun-preposition-noun). 


The extent to which the 25 grammatical forms exhaust 
all the possible combinations of terms appearing in index- 
entries depends, to & large degree, on the size of the 
index. Small indexes, which contain a limited number of 
three-term entries. and very few four-term or longer 
entries, fit almost completely within our 25 categories; 
while large indexes, which have a broader repertoire 
of multiple-term combinations, Row a larger residue of - 
unclassified entries. ` 

The percentages of the various. grammatical — 
occurring in each of the 10 indexes of our sample were 
calculated. The 10 higher-ranking grammatical forms are 
shown in Table 5; and their average percentage, over the 
10 indexes of the — ls given. 

. Altogether the combinations shown in Table 5 account 


for over 80% of the total number of entries and can 


Taste’ 5. The 10 higher-ranking grammatical forms oc- ' 
curring in & sample of 10 book-indexes 


Grammatical form Average percent of total 
of index entry number of entries ` 
1. Noun + = 34.51 
2. Adjective-noun 16.13 
3. Noun-noun - 12.80 
4. Noun-preposition-noun 6.40 
5. Adjective-noun-noun 328 
6. Adjective-noun-preposition-noun 2.44 
7. Noun-preposition-noun-noun 1986 ` 
8. Gerund I ; 1.69 
` 9. Noun-noun-noun hc ou 1.57 
10. Present participle-noun 001: 135. 
82.13 


therefore be considered to — the main 1 body of a 
book-index of small or medium size. As we have pointed 
out, large indexes present a.more varied array of gram- 
matical forms; but, on the other hand, they occur less 
frequently, so that they do not alter considerably the 
rank-order of grammatical forms presented in Table 5. 


E! 
e Conclusions . 


ur results show that subject-indexes of different 


books have very little in common. This is quite surpris- ' 


ing, especially for our sample of 10 books that not only 
belong to the same subjeet field and have the same 
orientation, but are also close in date of. publication. 
This latter fact is of some significance in favoring the 


ption that these 10 books revolve around the same: 
problems and concepts and that they were indexed on . 


the basis of the same current practices. How can one 
in the striking differences exhibited by their in- 





tors| that may account for their variability. 
ea be subjeet-index were merely an alphabetical list of 
- ords, books belonging to the same field and treat- 
e same subject would probably share a considerable 
number of identical index-entries. Key-words, in fact, 
are aries gard from the pool of technical words 
characteristic of a. particular field, which is rather re- 
stricted in size and includes primarily standardized words. 
It is true that a certain number of index-entries are 
single nouns, and are therefore to be considered equiva- 


lent; to key-words. But one-word entries, although ac- 


counting for a considerable portion of the total number 
of entries in small indexes, are only a minor part of 
medium or large indexes. In large indexes, the percent- 
e of one-word entries goes down to 5% or less. When 
we compare book-indexes of different sizes, then, the 
matthes generated by coincidence of one-term entries are 
limited in number. 
ost index-entries are formed by a noun together with 
one pr more modifiers. Although we have not attempted 
d out what the upper limit in the number of modi- 
fiersjis, we know that often index-entries are phrases of 
pesas s length. Changes in number, relative posi- 
and grammatical form of the modifiers provide a 
red array of combinations and permutations of index- 
entries. We have seen that, even if we limit the analysis 
to entries of between one and three terms and omit varia- 


tions in relative positions of the terms, we end up with a 


list of 25 combinations, which exhaust only 92% of the 
— variations offered by our sample. The variabil- 
ity. 


index-entries is further augmented by the addition 


of generic..nouns that have practically no retrieving 


power but are used to indicate relationships between 


. two or more terms of the entry (eg. DOS cause, 
effect, factor, similarity). 


We have already pointed out in the section titled “Con- 
ditions Favorable to Cumulation of Book Indexes” that 
an important determinant of index-entries variability is 
the different level of specificity used by different indexers 
to index similar text passages. In some cases this differ- ` 


ence in specificity level is required by the particular situa- 
. tion, A short index, for instance, may have room only for 


generic entries. On the other hand, spectfic entries may 
be appropriate for a small index, given the fact that the 
subject field of the book is very restricted and only 
specific problems or concepts are discussed in the text. 
We are not in a position, at this stage, to define the ideal 
level of specificity for a book-index. All we can do is to 
point out the problems created by large discrepancies in 
level of specificity of index-entries. Consider, for instance, 


the low probability of finding an exact match, either in 
. another book-index, or in a query, for the following 


index-entries: (a) “preexperimental habits and difficulty 
in assessing -commonality of behavioral:laws;" or (bj 
"operational definitions as basis for taxonomy of learn- 


33 


.. ing." (These are two nonatypical index-entries in a 


recent volume publishing the proceedings of a — 
posium on the psychology of human learning.) . 

Another crucial factor responsible for variability of 
index-entriés is the selection from the text of those “sig- 


- nificant passages" that deserve to be indexed. This de- 
- pends so much on the competence of the indexer, on his - 


personal judgment, his attitude, and the way he con- 
ceives the process of indexing, that it is not surprising if 
the results are at times hardly comparable with those 


produced by another indexer. We can refer to recent 


studies on reliability of indexing which show a rather low 
agreement among subjects (and in the same subject at 
different times) in selecting “representative” sentences 
from scientific articles (9, 10). 

We must remember that differences in methods of 
indexing are not the only determinants of book-indexes 


. variability. Obviously, differences inherent in the books 


themselves (ie. difference in text) are responsible for. 
differences in the indexes. There are many ways in 


. which books may differ. If we confine our analysis to 
` scientifc books, we can detect three main types of text 


differences responsible. for variations in subject-indexes. 
One is the orientation of the book (by which we mean 


_ primarily the kind of readers for whom the book is 


written); the second is the level of specificity of the 
text; and the third is, of course, the subject of the book. 
The first variable can perhaps be equated with the de- 
gree of technicality of the text. A book of zoology writ- 
ten for grammar school students is certainly less tech- 


nical than a textbook of zoology for college students. On 
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the other hand a textbook of zoology may be as technical 

as & book on protozoa, but probably 1 is less specific (at 
least on the topic of protozoa). 

' The subject of books is the main determinant, natu- 


° - rally, of index variability. Only in books from the same ' 


or partially overlapping fields can we expect to find 
similarity of index-entries. 

We can summarize the preceding discussion by saying 
that we can detect six factors affecting book-indexes 
variability, three of them related to index production, 
the other three to text production: 


Related to mdex production 
Size and -grammatical form of entries 
Level of specificity of entries 
Selection of text-passages to be indexed | 


Related to text production 
Level of specificity of text 
Orientation of book 
Subject of book 


As an answer to the question raised in the introduction, 
_a cumulation of book-indexes, although feasible, may not 
be so simple and practical as it might have been if the 
overlap .among book-indexes had been found “to be 
greater. Our findings suggest further study into the rela- 
tionships among index entries in order to estimate the 
limiting efficiency with which a cumulation of book 
indexes could be produced and used. 
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Gomputer-Prod uced Microfilm library Catalog 


e philosophy, production, and cost-effectiveness of. 


a, computer-generated library catalog is described. 
This catalog is unique in that it utilizes direct computer 


Production of the library catalog depends on the cór- 
ect blend of the following ingredients: capability, cost 
and user acceptance. This axiom was tested in the 

ineteenth Century when the card catalog displaced cata- 
ogs in book form. 

With the advent of computer technology, the — 

talog has come full circlo within à century. To be 

puro the first catalogs produced by computers were in 
ard form, which was a natural evolutionary advance 
when passing from manual to automated systems. 


Irrespective of format, the point to be stressed is that 


each of these computer-produced catalog systems, 
whether card or book, was required to stand the test of 
capability, cost, and user acceptance. And it is pre- 
cisely this test that the third catalog system passed 
convincingly. 


| This third system is the ptas library 


catalog in microfilm form. It is a product of off-the- 
‘shelf hardware and of programming excellence. Admin- 
istrators of large specialized libraries as well as directors 


of research wil be particularly interested in the miero- ` 


film system because it is the least costly and the most 
advanced, effectively user-oriented catalog system | in 
operation today. 


Such a system was installed in the Technical Informa- 


tion Center (TIC) of Lockheed Missiles & Space Com-: 


pany (LMSC) and became fully operational in July 
1966. The capabilities of this new system far outshine 
its predecessors and will be discussed in detail here. 
Besides achieving these advanced capabilities, the present 


system actually returns a moderate savings in comparison ` 


to its forerunners. Having passed the capability and 


1 On-line real time dialog 
powerful, are still ol coutly. 


library catalog systems, admittedly more . 


-—— 


to microfilm composition techniques, employing the 
Stromberg Carlson 4020. Cost, user acceptance, and 
by-product capabilities are stressed. - 


W. A. KOZUMPLIK and R. T. LANGE - 


Technical Information. Center 
Lockheed Missiles & Space Company 
Sunnyvale, California 


i 


costs tests, the system was presented to the public—the 


. operational as well as the administrative user, that is, the 


scientist on the one hand and the librarian on the other. 
—with favorable results. The.user. found that look-up 
time was greatly reduced and that the system was easy 
to operate. 

Lockheed’s experience has affirmed user acceptance to 


be broadly based and has brought requests to TIC 


management to install a microfilm catalog in R&D 
oriented buildings. This can be done at no extra com- 
puter costs and is beneficial whenever high-priced sci- 
entists and engineers are located at a considerable 


_ distance from the library. Obviously convenience of 


look-up on the premises of his own building will revolu- 
tionize the résearcher’s work and should improve the 


. quality of his product and prevent unwanted duplica- 
tion; this, after all, is the ratson d'étre of the special 


library. The point to be emphasized is that a catalog 
that is located a few feet from the researcher's work 


station dynamically- improves his accessibility to the 


company's cataloged literature resources.? 

The computerized catalog system installed and operat- 
ing at LMSC delivers the following products in accord- 
ance with design requirements: 


1. An updated library catalog in microfilm form 
2. A listing of new publications added, using the key- 
word-in-title (K WIT) format 
3.. An updated report of open-entry items contained 
' in the library 
4. A source ‘authority list, with appropriate cross- 
references 


2 The increased volume of use and reuse of cataloged information, of 
course, Inevitably introduces the problem of logistics; that is, the need 
to acquire or generate multiple copies to satisfy simultaneous multiple 
user needs, 
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5. A subject authority list, with “see” and “see also” 
references appropriate 


- These products are processed quarterly with the excep- 


tion of the KWIT, which is issued semi-monthly. Basic | 


."to all products is the system-derived and magnetic-tape 
. Stored master file on all publications contained in the 
library together with the ability to delete, add, or change 


- . records on the master file as determined by the con- 


trolling organization, namely, the library managemient. 

It is worthy of note that these products listed can be 
delivered for several separate libraries within the same 
computer-processing cycle. At LMSC for instance, these 
products are currently separately generated for two 
collections while two additional collections are in the 


. process of converting to this low-cost retrieval system. - 


The system originates ‘with source documents being 
keypunched and forwarded to the computer (Fig. 1). 
These records are of various types, the most predomi- 
nant being catalog additions and deletions. The cards 


are generated onto magnetic tape and sorted. The input: 


transactions are subjected to certain editing require- 
ments, reformatted, and exploded into various multiple 


records. This explosion is based upon the number of. 


tracings in each document. During this phase of the 
operation, the documents that pertain to new publica- 
tions are also generated on a separate tape. that PONES 
the keyword-in-title listing. 

The next step in the system operation is to sort the 
, edited-exploded. transactions into the same sequence as 
the master file. These sorted transactions are processed 


against the master file to produce the updated catalog - 


and an updated master file. Also during this pass, a 


- tape is generated that produces the source and the sub- 


ject authority listings. 
The user-related visible components of the computer- 
. produced microfilm library catalog system are microfilm 
cartridges and a microfilm reader together with the 
semi-monthly xwrr entitled New Reports & Books. 
The catalog's 1,051,060 look-up points or entries are 


E organized in six sections: source, title, author(s), con- 


_ tract number, subjects, and report numbers /call number 
(Figs. 2, 3, 4, 5, 6, 7, 8). Both reports and books which 
heretofore had been cataloged and shelf-ordered accord- 
ing to separate systems, resulting in separate catalogs, are 
now for the first time integrated into a single catalog. 
The 16 mm microfilm compressing these million-plus 
retrieval points are loaded into 40 cartridges; each car- 
. tridge contains 100 ft of film on which are exposed 1,800 
two-column pages of computerized eatalog text processed 
by the SC 4020. Each page contains approximately 
14 entries. Altogether, this is a significant compression 
of text and space since the million-plus records when in 
card form had previously occupied 720 standard library 
catalog drawers. The cartridges are housed in an 80- 
compartment rack that stands next to the reader on a 
60x 30-in. work table (Fig: 9). Each cartridge is labeled 
as to contents. The labels are colored differently to 
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——— visual ease in distinguishing the six. separate 


`. sections of the catalog. 


Any .on-shelf automatic microfilm reader and asso- 
ciated cartridges may be used to display the microfilmed 
text. For the installation at LMSC the Bell & Howell 
Microfilm Reader, Model 531, was selected because: it 
provides not only visual reading comfort with its zoom 
lens and its three intensities of lamp brightness but also 
speed. The latter is derived from the use of the Bell & 
Howell patented ‘automatic no-rewind cartridge. The 
user simply removes the cartridge after his look-up; the 
next user of that eartridge does not have to rewind the 
film but merely commences his search to the left or 


- right as the case may be. The zoom.lens enlarges text 


size up to 100%. 

A complete, cumulated, and Bormes microfilm. TM 
log is produced quarterly: Computer processing time 
together with duplicate microfilm. processing, label gen- 
eration, cartridge loading; and distribution to the operat- 
ing location takes ten work days. 'The decision to 
Schedule production on & quarterly basis rather than 
bi-monthly or even monthly was founded on production 
costs and computer availability. For instance, the bi- 
monthly production cycle would. cost $3,000 more 
annually. 

Between periodic microfilm catalog — rung, 
users are kept informed of titles added to the collections 
by the semi-monthly computer product in Kwrr format, 
New Reports & Books. This is a variant of Bell Tele- 
phone Laboratory’s BEPIP program and is structured in 
two parts, Title, and Bibliography. It ig not a retro- 
spective retrieval tool of any great effectivity but serves 
basically to announce works newly added during the 
period reported to the inventory of literature resources 
that are available to qualified users. For the user whose 
approach is by subject, author, or contract number, the 
three-month gap in currency of the catalog denies him 
access to the latest information in the inventory. But 
when the user’s approach is by source, report number, 
or title, the xwrr is moderately helpful; consequently, 


. this type of user will be but mildly adversely affected 
. in his exploitation of the most recent resources. 


Queuing problems caused by multiple simultaneous 
catalog utilization are forestalled by installing multiple 
readers and catalogs at a ratio of two to one for the 
library’s clientele and one to one for library technical 
services staff. The cost of these added equipments and 


- components is more than offset by savings derived from 


text compression. In actuality, the savings in compression 


paid for the four catalogs and eight readers installed on 


library premises for use of scientist/engineers and for 
the five catalogs and five readers for use of library staff. 


. In addition, the microfilm system operates at a net sav- 


ing of $13,000 annually because (1) card filing costs are 
avoided; (2) there are no catalog cases to purchase; and 
(3) there is a 200% saving in space to house the micro- 
film installation. | 
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Fic. 1. LMSC Library Catalog, Data system flow 


: | | | American Documentation — April 1967 


tha tool l š PA * Ay LIBRARY 08-07-00. PARC 10947 


zour , .  WIORONLN PERONIDE PROPELLED TORPEDO, BLENI- 
“MUSEUM OF SCIENCE ANO INOUOTRY, CHICAGO, ILL. TEARLY REPORT ON PROJECT [-164-Nyf-4-58. 
iuit 38 138 Lesy a PERITOS - i iUt T 1961 * 31 DEC 1961. 
NONE 
INSTITUTE FOR TECHNICAL MECHANICS (GERMANY) I $i 960 $1. 48 $ 
à88-102-03-315 UN : 
INVESTIGATION OF TRE DISTRIBUTION OF TENSILE INSTITUTE OF CHEMISTRY FOR EIPLOBIYESCITALT) 
ANB SENDING — in NOTCHED FLAT DARS. = ICE-60-81R- (1) 5 
NEUBER, T.H. = HIGH SPEEO AMTISHIP TORPEDO. SENI-TEARLY 
, WUNICH, GERMANY TRE I REPORT ON PROJECT 1-23-MUP-RSO. PERIOD - 
JULY 63 39^ C-1-P 1 JULY - 31 DEC 1961. I 
| | I | f : NONE 
INSTITUTE FOR T[LÉCOMMUNICATION SC... MARIPERMAN LAB.  — 
stt f I 31 DEC $1 2P i C-3¥ 
NATIONAL Kuntay OF STANDARDS ; - 
‘ INSTITUTE OF ELECTRICAL + ELECTRONICA ENS. 
OANSTITUTE of — BIOLOGY LPERU) I _ > 806 139 563 
ALRU-SAM-TOR-62-67- - UN ARTIFICIAL INTELLTGENCE--A COMBINED PREPRINT. 
THE INFLUENCE OF HIGH ALTITUDES OM THE Of PAPERS PRESENTED AT... WINTER GENERAL 
ELECTRICAL ACTIVITY OF THE HEART. - MEETING, N.Y., 27 JAN - 1 FES 1905, ' 
ELECTROCARDIOCRAPHIC AMD VECTORCARDIOGRAPHIC {$-142), I 
OBSERVATIONS IN ADOLESCENCE AWO ADULTHOOD. ` NONE 2 
PEMALOZA, D. + ET AL . WEM YORK E — E rd 
FACULTY Of MEDICINE, LIMA, PERU 1963 188^ BIBS. PA BL 
AUG $2 222 C-1-8 i : : 
' ` š INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGS. 
INSTITUTE OF ANDEAN BIOLOGY (FERU) I $19.9 159- 1963 164 
` ATRU-SAN- TDR- 62-89 I i UN RELIABILITY OF SPACE VEHICLES - 4TH ANNUAL 
MYOGLOBIN CONTENT ANO ANZTMATIC ACTIVITY SEMINAR, LOS ANGELES, 6 DEC 1963. 
OF HUMAN SKELETAL MUSCLE - THEIR RELATION WITH . NONE 
THE PROCESS OF ADAPTATION TO HIGH ALTITUDE, N. HOLLYWOOD, CALIF. 
REYWAFARIJE, B. 1963 1 VOL. PA $v 
| — DEPARTMENT OF PATHOLOGICAL PHYSTOLOGY, LIMA, | : 
PERU INSTITUTE OF ELECTROCHEMISTRY 
Mot 62 ‘op C-1-P' | 541.37 161 305 
: I ELECTROCHEMISTRY OF MOLTEN AND SOLID 
INSTITUTE OF AMDEAM BIOLOGY (PERU) I ELECTROLYTES. AUTHORIZED TRANSLATION FROM THE 
ALRU-SAM-TOR-62-88 un RUSSIAN, INSTITUTE OF ECECTAOCHERES IRE Gi: 
PYRIDINE MUCLEOTIDE OXIDASES AND TRANSACTIONS. 
TRANSHYOROGENASE IM ACCLINATIZATIOM TO HIGH INSTITUTE OF ELECTROCHEMISTRY 
Ñ ALTITUDE. N.Y., CONSULTANTS BUREAU MM 
RETMAFARJE, B. : 1961- 2 VOLS C-P(V.142) C-SY(Y.1) 
DEPARTMENT Of PATHOLOGICAL PHYSIOLOGY, LIMA, 
PERU INSTITUTE OF ENGINEERING RESEARCH 
Nov 62> or C-1-5 SEE 25 


| | CALIFORNIA, UNIVERSITY OF 
INSTITUTE OF AVIATION MEDIC. (NORWAY) 


stt | INSTITUTE OF CNVIRONNENTAL SCIENCES 
ROYAL WORWEGTAM AIR FORCE E 616.98 199 $63 
ENVIRONMENTAL EMGIWEERIMG LECTURE NOTES, 
INSTITUTE OF AVIATION MEDICINE (NORMAT) PRESENTED Br NORTHERN CALIF. CHAPTER, I 
RNAF-FW1-63-3 ye INSTITUTE OF ENVIRONMENTAL SCIENCES AND 
COMBINEB EFFECT. OF COLD AND ALCOHOL OM MEAT ENGINEERING AND SCIENCE EXTENSION, UNIV. OF 
BALANCE IN WAN. CALIFORNIA, BERKELEY, FALL, 1961. ED. Br 
«LANGE, K. + ETAL - JOHN D. CAMPBELL ANO OTHERS. 
. OSLO, NORWAY . INSTITUTE OF ENVIRONNENTAL SCIENCES + ET AL 
etc 63 VAR, PAGING PA ' - wD. VOL. / C.P C-s' 
INSTITUTE OF CHEMISTRY FOR EXPLOSIVES (ITALY) INSTITUTE OF INTER-AMERICAN AFFAIRS 
ICC-468-bvR (0) C (^ Wen 465 


Fra. 2. Sources 
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Faventia 


TOR 32-408 . PA + 


SEPT $t 4P C-1-2-8 10-9 C- YN 
TNERMOOTNANTC FROPERTICS OF OXYGEN 
CLORGTA INEI, OF TECHNOLOGY 
&£11-4-393- TA-2 
THE IMERNODTMAMIC PROPERTIES OF OXYGEN 
FROM 20 OESREES TO 100 DEGREES K. TECHNICAL 
REPORT NO. P. 
MULLIMG, J-C., * ET AL 
ENGINEERING EXPERIMENT STATION, ATLANTA, GA. 
L MAR 62 YAR. PAGING C-8Y 


THERMODYNAMIC FROPERTICS OF PARANTOROGEN 
GEORGIA INST. OF TECHNOLOGY 
@LT-A-393-TR-1 
THE THERMODYNAMIC PROPERTIES OF PARAHTDROGCEN 
FROM 1 DECREE TO 22 DEGREES K. TECHNICAL i 
EEPORI MO. I. 
MULLINS, J.C. + ET AL ; 
ENGINEERING EXPERIMENT STATION, ATLANTA, GA. 
1 NOV $1 — $8P €-$¥ 


THERMODIMAMIC PROPERTIES OF SALINE WATER, 
MONSANTO RESEARCH COMP. 
OSWM-PR-1504 
THERNODTMAMIC PROPERTIES OF SALINE WATER, 
POWER, W.H. * FABUSS, B.M. i 
BOSTON LABS., EVERETT, MASS. 
JULY 68 TOP C-3v 


THERMODYNAMIC PROPERTIES OF SALINE WATER. 
OFFICE OF SALINE WATER i 
628.18 fil 466 

THERMODTMAMIC PROPERTIES OF SALINE WATER, 
RESEARCH AND DEVELOPMENT PROGRESS REPORT . 
NO. 136. 

FABUSS, B.M. 

GPO 

1965 63? sy 


THERMOOYMAMIC PROPERTIES OF SEVEN WETALS AT 
LOS ALAMOS SCIENTIFIC LAB. 
LANS-2640 


x 


UN 


UN 


UN 


UN 


LIBRAR y 


NATIONAL BUREAU OF STANDARDS 


THERMODYNAMIC PROPERTIES OF TECHMET'UM ANO 
RHENIUM COMPOUNDS. (YID. 
OF RHEMIUM TRICHLORIDE AND RHENIUM TRIBROMIDE. 
FREE ENERGIES AND ENTROPIES. 
KING, J.P. + COBBLE, J.W. 
LAFAYETTE, IND. 
OCT 59 108 C-1-P 
THERMOOTNAMIC PROPERTIES OF THE ATMOSPHERE. 
RAND CORP. 
RAND-RM-2292 
THERMOOYMANIC PROPERTIES OF THE ATMOSPHERE 
Of VENUS. T3 
RAYMOND, J.L, 
SANTA MONICA, CALIF. 


THERMODTNANIC PROPERTIES OF SEVEN NETALS AT 26 Nov 58 31P ` C-1-2-P 3C-9Y 
ZERO PRESSURE. 
CARTER, W.J. THERNODYMAMIC PROPERTIES OF URANIUM 
LOS ALAMOS SCIENTIFIC LAB., LOS ALAMOS, M.MEX. PRATT AMO WHITNEY AIRCRAFT 
9 MAY 62 62° C-1-P PUAC-478 
` THERMODYNAMIC PROPERTIES OF URAMI UM 
THERMODYNAMIC PROPERTIES OF SOME ABLATION MONOCARBIDE. i 
GENERAL ELECTRIC I VOZZELLA, F.A. + DECRESCENTE, M.A. 
6E- 6450954 C MIODLETOWM, CONN. 
THERMODYNANIC PROPERTIES OF SONE ABLATION SEPT 65 24P C-1-8 C-5Y 
PRODUCTS FROM PLASTIC MEAT SHIELDS IN AIR, I . d 
tu) I THERMODYWAHIC PROPERTIES OF URANI UN 
BROWNE, W.G. PRATT AND WHITNEY AIRCRAFT 
RE-EMTRY SYSTEMS CEPT., PHILADELPHIA, PA. PWAC-479 
24 AUG 64 174P C-s1 THERMODTMANTC PROPERTIES OF URANIUM 
MONONTTRIOE. 
TRERMOCYMAMIC PROPERTIES OF SOME BORON NORE 
Fig. 3. Titles 


. 00-07-64 PAGE 6493 


M85. 4943 UN 
THERAODYNANIC PROPERTIES OF SOME BORON 
COMPOUNDS. 

EYANS, W.N. + CT AL 
31 AUN 34. VAR, PAGING C-1-P 
THERMOOTNAMEC PROPERTIES OF STEAM, 

HORE 

$36.42 REST. 585 
IHCRMOOTNARIC PROPERTIES OF STEAM, 

INCLUDING DATA FOR THE LIQUID AMO SOLIO 
“PHASES. 
KEENAN, JOSEPH HENRY + REYES, FREDERICK G. 
MEW YORK, J. WILEY AND SONS, INC. 
1936 of C-sY 

IMERMODTMAMIC PROPERTIES OF SUPERHEATED 

AMERICAN DOCUMENTATION INST. 

401-5824 un 
IHERMODYNANIC PROPERTIES OF SUPERHEATEO 
ACETYLENE. i 
NONE : 

K.D. oP C-1-P 
` THERMOOYNANIC PROPERTIES OF TECHNETIUN ANO 
^ PURDUE UNIVERSITY 
AFOSR-TN- 59-968 UN 
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Tam 32-403 


MEAL, tT. " 

AUR FORCE CAMBRIDEL ACSEARCH LARS. 
AFCRL-83-375 E ; 

SLANT DESICCATION POLYSONS OF GREAT BASIN 


PLAYAG.  ENVIBONNENTAL RESEARCH PAPERS, 
RO. 123, 

NEAL, J.T. : 

$t3f028, mass. 

Aus 65 30. C-1-P 


MEAL, J.T.s ET. 
ALR FORCE CANMBRIBCE RESEARCH LABS. 
AFCRL-65-268 
&EOLOST, WISCRALOST, ABM NYOROLOGY OF W.S. 
PLATAS. ENVIRONMENTAL RESEARCH PAPERS, 
MO. 95. 
MEAL, J.T., EB. 
SEDFORB, RASS. 
APR 65 176r C-1-? 2¢-3¥ 
NEAL, L. 
NATIONAL AERONAUTICS AND SPACE ADMIN. 
NASA-TN-1-016 
AX EXPLORATORY INVESTIGATION AT A MACH 
MUNBER OF 6.9 INTO THE USE OF AERODYNAMIC. 
CONTROLS FOR MODULATING THE LIFT-ORAG RATI 
OF AM APOLLO TYPE CONFIGURATION, (U) 
MEAL, L. 
LANGLEY RES. CTR., LANGLEY STATION, 
MANPTON, YA. 
MAY 63 18F C-1-P-P 4C- $Y 
MEAL, L.G. 
ARGONNE NATIONAL LAB. 
AML-6625 
LOCAL PARAWETERS 
NITROGEN FLOW. 


IN COCURREMT XERCURT- 


MEAK, L.G. 
ARGONNE, ILL. 
jaw 63 T5? C-8'Y 
NEALE, D.M. 
HONE 
621.384 W25 466 


COLD CATHODE TUBE CIRCUIT DESIGN. 
NEALE, D.t, 
PRINCETON, N.J., O. 
1965 259° 


VAN NOSTRANO INC. 
$v 


NEALE, L.C. 

WORCESTER POLYTECHNIC INST. 

WPL-AML-49 
REPORT ON EXPERIMENTAL INVESTIGATION OF 
CAVITATION ON BEHTWD AM ACCELERATED DISC. 
NEALE, L.C. + STEVES, H.K. 
ALBEN MY@RAULIC LAB. , WORCHESTER, MASS. 
FES $0 PII C-i-f 


WEAVES, 4. 
FRAMELIM INST. 


PA + YY 


UN 


UN 


o 


UN 


UN 


LIBRARY 08-07-04 LLL 
ASD-TOR-O2-373 VN 
RESEARCH ON SPONTANEOUS MAGNETIZATION IR 
80.10 BODIES. 
Nf AYER, A, 
PHILADELPHIA, PA, 
APR OR 319 fk 
NEB LRER, FIR, 
GOOOYEAR ALROBPACE CORP, 
f6L- TR-41- 1? 
AEROOYNANTC DEPLOYABLE OECELERATOR 
PERFORNANCE-CYALUATIOM PROGRAM, 
NCBIRER, PR. 
AKRON, OHIO 
AUG 63 3038 C-1-P 
NEBLETTE, C.B. 
, NONE ` 
171,35 N2?! 
PHOTOGRAPHIC LENSES. 
NEBLETTE, C.B. + MURRAY, A.E. 
MASTIMGS-ON-HUODSON, N.Y., MORGAN AND MORGAN 
1965 152? sy - 


366 


NECHAYEV, Y.N. 
FOREIGN TECHHOLOGY DIY. 
FTO-NT-64-301 
AIR INTAKE DEVICES OF SUPERSONIC AIRCRAFT. 
NECHATEV, Y.H, 
24 AUG $4 130P C-$¥ 
NECHELES, R.M. 
LOCKHEED- CALIFORNIA CO. 
LAC-LR- 18755 


FULL LENGTH STRESS STRAIN CURVES AND UNIFORM 
ELONGATION MEASUREMENTS ON SELECTED TITANIUM 


ALLOYS - $87. 
MECHELES, R.M. + ET AL 
BURBANK, CALIF, 

5 APR 65 22P C-S¥ 

NECHELES, R.N. 

LOCKHEED-CALIFORNIA CO. 

LAC-LTN-50401 ; 
METALLURGICAL ANALYSIS OF FAILED SEARING 
TRUNNION MUT-MODEL 108. 

WECHELES, R.N, 
ENGINEERING RES. LAD., BURBANK, CALIF. 
19 AUG $3 VAR. PAGING C-1-P 


NECHELES, R.N. 

LOCKHEED-CALIFORNIA CO. 

LAC-LTN- 506710 
METALLURGICAL ANALYBIB OF SERVICE FATLEO 
STABILIZER HOT LEADING EDGE - MODEL P-34. 
NECHELEO, ROM, 
CMGINECAING AES, LAS. 
2 WAR 04 VAR, PAGING C-1-8 


MECHELCO, R.N, 


Fra. 4, Authors 


4147 


UN 


UN 


UN (P) 


UN 


UN 


PA + 17 
AP-38 1091) -11144 : 
&taufoRB UNIVCEAGL TA 
APOC-AL- 10-44-10) (d 


AN INETANTANCOUG WiCROWAYE POLARINETER 
RECRIVEA (H). TECMMICAL ALPORT NO, 1011-12. 


CRANE, N. 
QTAMPORG ELECTRONICS LAD., BTam ORB, CALIF. 
"at $4 ser WICKS 


AF-39 (857) -11144 
GTANFORO UNIVERSITY 
AP QC-AL-708-66-227 
THAING OF Cu LASERA OVER ANCA TROM 
SAMBUIS TNS - SOME POSBIBLE APPROACHES. 
MORRIS, Rid. 
aus Ue 49? C-31 


AF-39 10931-11154 

SYRACUSE UNIVERSITY 
AFIC-ML-TOR-04-144 

A STUDY OF THE EFFECT OF SUPERIMPOSED 

STRESS CONCENTRATIONS. 

WEISS, Y. + ET AL 

SYRACUSE, N.T. 

APR 64 34P C-1-P 


AF-33 (637) -11183 
AEROJET-GEMERAL MUCLEONICS 
APL-TOR-$4-124-vOL. I 
RESEARCH IN HIGH TEMPERATURE PLASMAS FOR 
SPACE. APPLICATIONS. VOL. 1 ~ PHYSICS. 
NONE 
SAM RAMON, CALIF. 
OCT 64 180? C-1-* 


AF-33 (657)-11184 
MINNESOTA, UNIVERSITY OF 
FÜL-TOR-64-156 

A SECOND ORDER SOLUTION FOR THE VELOCITY 
DISTRIBUTION IN A TURBULENT WAKE, 
HEINRICH, H.G. + RUST, LM. 

MIMMEAPOLIS, MINN., 

APR 65 4?? C-1-P 


AF-33 i657)-11200 
AEROJET-GEMERAL CORP. 
AFSC-XL- T6R- 64-2060 

NON-EYACUATED CRYOGEMIC THERMAL INSULATION 
STUDIES. 

JOMNSON, C.L. + HOLLWEGER, O.J. 

AZUSA, CALIF. 

SEPT 04 Sip C-1-P C-8Y 


AF-3$ (637) -11217 
AERONÜUTRONIC 
AST-8-Z2577 
APPLICATION OF MATERIALS TO ADYANCED 
ROCKET NOZZLE AND HOT GAS CONTROL SYSTENS. 
(9) TNIRD QUARTERLY PROGRESS REPORT. 
BLAES, M.W, + EF AL 


Ux 


UN 


UN 


UN 


UN 


LIBRALy 


TS TI Past 8312 


1$ APR 64 VAR. PAGING: C-I1- PEDI. 


e 
> 
E y - ` 


AF83 (067924 2 


GENERAL ELECTRIC 7 ` 
6L-112R3-4PA-3 ` I un 
THE STRUCTURAL STAGILITY OF MELOS In 
COLUMBTUM ALLOYS. PERIOD - NOY 1, 1903 - 
FEB 1, 1964. 
YOUNT, R.L. + KELLER, B.L. 
MATERIALS OEY. LAB, OPERATION, CINCINMATI, 
OHIO 
10 FER $4 23» C-$¥ 


AF-33 (857) -11233 


FRANKLIN INST. 

FRAM-1-B2122-1 ya 
OtStTiLLATIOM OF BCRTLLTIUN Er SUBLIMATION 
AND EVAPORATION, INTERIN REPORT, PERIOD - 
AUGUST 15 TO MOVEMBER 30, 1969. 
LONDON, G. + HERMAN. M. 
LABORATORIES FOR RES. AND DEY., PHILADELPHIA 

PA, 
4.0. 119 c-sv ^ 


AF-33 (6573-11233 


UNION CARBIDE CORP. 

AFSC-WNL- T0R-64- 1? 3-P T. 3 UN 
HIGH TEMPERATURE PROTECTIVE COATINGS FOR 
GRAPHITE, PART -IILL 
CRISCIONE, J.M. + ET AL 
PARMA, OHIO 
OCT 65 199P C~1-P 


AF-33 (657)-11316 


HONEYWELL, INC. 
FOL-TOR-64-69 UN 
TRAINABLE FLIGHT CONTROL SYSTEM 
INVESTIGATION. 
SMITH, F.B. + ET AL 
$T. PAUL, MINN. 


qux 
AUG 64 1759 C-1-P NE 


"Pis. | 


AF-33 (637) - 11326 


ELECTRO-OPTICAL SYSTEMS, IMC. 

£081-3390-à3-1 UN 
OPTICALLY PUMPED IMAGE LIGHT AMPLIFICATION, 
QUARTERLY REPORT NO. 1. PERIOD - 10 MAY ~ 
10 AUG 1969, 
BERNSTEIN, H. + ET AL 
PASADENA, CALIF. 
AUG 63 53» C-s¥ 


AF-33 (097) -11331 


SANTA RITA TECHNOLOGY, INC, 
MRL-TOR-63-00 UN 
AN ELECTRONIC ANALOG OF THE EAR. 
GLACSOER, E, + ET AL 
BIOACOUSTITG LAB, DIV. , MENLO FARK, CALIF. 
JUNE 63 ser C-1-P 
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BÀ + HY 
MESCER+- SUPERSONIC CHARACTERIITICIE 
NATIONAL AERONAVTLCS AND SPACE ADMIN, 
NASA -TM-B- 4814 YN 


A MODIFICA METHOD OP INTEGRAL RELATIONS Fon 
GUPERROMTC MONLAVILIOALUN FLOW OVER A WESSC. 


Nt ewan, T 

LANGLEY REA. CTR., LANGLEY STATION, HAMPTON, 
a. 

FES 401 ae C-1-B-P 80 +6¥ 


1 
WEOGEO~-BUPERBOMTC CHARACTERTETICE 
PRINCETON UNIVERSITY 
AFOOR-63-0008 
WYPERSONIC FLOW OVER A WEOGE WITH UPSTREAR 
WON-UNTFORNLTIES ANO VARIABLE WEDGE ANGLE. 
SERGE, ALR. 
GAS DYNAMICS LAB., PRINCETON, N.J. 
SEC 64 LIM. C-t-P 


WEDGES--SUPERSONTC CHARACTERISTICS 
SOUTHERN CALIFORNIA, UNITY, Of 
AFOSR- T48-5$3-344 
A NETNOD OF ALLEVIATING THE EFFECTS OF THE 
BOUNDARY LAYER SHOCK-WAVE INTERACTION AT A 
COMPRESSION CORNER. 
WILLIAMS, J.C. 
ENG. CTR., LOS ANGELES, CALIF. 
31 MAY 56 46° C-1-P 


WEÜGES--TRANSONIC CHARACTERISTICS 
VIRGINIA POLYTECHNIC INST. 
AFOSR-TR-55-14 : 
INVESTIGATION OF WEOCES IM TRANSONIC FLOW. 
FIMAL REPORT. 
TRUITT, a. u. 


VÀ. ENG. EXPERIMENT STATION, BLACKSBURG, VA, 


MAY 55 $29* - C-1-» 
WESGES--WAKE 
AYCO CORP. 
AFESD- T8R-6$4-150 y 
THE NEAR WAKE OF A WEDGE. 


WEISS, R. 
AYCO EVERETT RES. LAB., EVERETT, MASS. 
BEC 44 424 C-1-? C-8'Y 
WEO6GES- -MAKE 
AYCO CORP. 


AYCO-RAD- TW-63-19 
TWO-OTWENSTONAL WARE MEASUREMENT - PART 1, 
WAKE PEVELOPNENT. 
TOD18CO, A. + SANBHORK, V.A. 
$ APR 63 gsP c-8¥ 


VIB6ES--VARE 
LOCKNEEB WISSILES AND SPACE CO. 
LN8C-901 O84 
THE EFFECT OF A LONGITUDINAL GRAVITY FIELD 
ON THE RE-ERTRANT JET TN A STEADY SYMMETRIC 
*àvitt FLOW. 


UN 


VIBRAR 


WET GHING-MACHINES 


04-07-04 PAGE 17090 
CUTMECRT, J.W. 


Mb, 10? MC 6S 1-9 


WEDOLO--WATER ENTRY 


COLUMBIA VNTYERATTY 
(4-1-64-0NR- 1406100) vet 
INPACT OF AN ELAGTIC WEOEE ON A CONPREISITELE 
FLOW. 
FEIT, b. + ET AL 
DEPT., OF CIVIL ENGINEERING ANO. ENGINEERING 
MECHANICS 
NOV 64 36? C-av 


WETBULL DISTRIBUTION 


AIR UNIVERSITY 
ATRU-GRE-MATH-$4-12 Un 
RELIABILITY AMALYSIS OF MOM-ELECTROMIC 
COMPONENTS USING WEIBULL, GANNA, AND LOS 
NORMAL DISTRIBUTIONS. THESIS. 
STOY¥, 0.6. . 
ATR FORCE INST. OF TECHNOLOGY, 
WRIGHT-PATTERSON AFB, OHIO 
AUG 64 T3P C-s¥ 


WEIBULL DISTRIBUTION 


GENERAL ELECTRIC 
6€-618055 UN 
RELIABILITY MEASUREWENT FOR LONG LIFE 
SYSTEMS. 
FRITZ, E.L. 
MISSILE AND SPACE VEHICLE OEPT. 
20 MAR 61 22P c-av 


WEIBULL DISTRIBUTION 


MOTOROLA, INC. 

WOT- 11) Ve 
USE OF THE WEIBULL DISTRIBUTION FUNCTION 
IN THE AMALYSIS OF MULTIVARIATE LIFE TEST 
RESULTS. 
PROCASSINI, A.A, + ROMANO, A. 
SEMICONDUCTOR PRODUCTS DIY., PHOEMTE, ARIZ. 


WE 16HING- MACHINES 


MONE 
389.18 J43 745 
THE EXANTNATION OF WEIGHING COUIPNENT, 
A MANUAL FOR STATE ANO LOCAL WEIGHTS ANB 
WEASURES AGENCIES. I8BUED "AR 1, 1965. 
JENSEN, MALCOLN W. + SMITH, RALPH W. 


e^o 
1965 , 2798 C-8¥ i 


NONE 

389,1 N?IT 565 
TESTING OF WEIGHING EQUIPMENT. NATIONAL 
SURCAU OF STANDARDS HANDBOOK WIP, : 
BHITH, RALPH WEIR | 
$OYERNNENT PRINT, OFF, š 
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BÀ + $Y LIRA C1-57- 84 Bash 4T 


188 32-498 
&-1141 030393 UNION CARDIOL COR”. 
VEION CARBRIOL Cone, HAGA -CR-$4255 C 
1*-14149 UN PHASE 1 - SUMMARY REPORT, FABRICATION OF 


A FIXED FILTER PAPER ALPHA ATR MOMETOR, 
SLABORN, 6.8. 

MOCLTAR DIY., OAK RIDGE, TENN, 

23 NAR 4 19? C-1-P 


TUNGSTEM-URAMTUM DIOIIOL MONETCONE 
STRUCTURE”, 

WHITE, D.E. + FOLEY, ELN, 

NUCLEAR DIY., OAR RIOGE, TENN. 


Lk, To QAALPMA OPEC TROMETE RS 28. ALPHA 13 MAR 43 $12 Ce i=? 
PARTICLCE--OETECTIOMN AMO MEASUREMENT 34, 
BFARORM, G.B. 144. -1989 53A,u-?T405-EN6-?24 X-16832-PT, 1 064783 
UNION CARBIOE CORE. 
A- 395 049746 K-1632-PT. 1 UR 
JNTOM CAROIDE CORP, A GRAVIMETRIC GAS FLOW STANDARD ~ FART 1 
£-1390 UN DESIGN AMD CONSTRUCTION, 
INSTRUNENTATTION FOR MEASURING FREEZING COLLINS, W.F. + SELBY, T.W 
18 MAT 65 TOP C-1-B 


POINTS OF URANIUM HEXAFLUORTOE-HYTOROCEM 
FLUORIDE SAMPLES. 
BARTEUS, M.J. 


14.7. 2A.GRAVIMETRIC ANALYSIS CB.GAS FLOM-- 
MEASUREMENT SA.COLLINS, W.T., 4À.K-1632- 


NUCLEAR OL¥., OAK RIDGE, TENN. PT. 1 

1? MAR 64 14? C-1-P 

JA.T. 2A. INSTRUMENTATION 2B.URAMIUM X-1636 067783 
FLUORIDES-- TEMPERATURE FACTORS 2C.MYDROFLDORIC UNION CARBIDE CORP. 

ACTO--TEMPERATURE FACTORS SA.BARTKUS, M.J. NÀSA-CR- 54376 CRO 


4A.K-1590 5A.W-7403-ENG-26 


FABRICATION OF TUNGSTEN- URANTUM DIOXIDE 
HONEYCOMB STRUCTURES. (U) PHASE -il - 


X-1421 (RO¥.3 064289 QUARTERLY REPORT, FOR PERIOD ENDING MAR 15, 
UNION CARBIOE CORP, 1365, 
NASA-CR- 54275 CRO FOLEY, E.M. * ET AL 
OCTERNINATION OF IMPURITIES IN TUNGSTEN- NUCLEAR DIY., OAK RIDGE, TENN. 
URANIUM DIOXIVE MIXTURES. FINAL REPORT, 21 MAY 65 TTP C-$V¥ 
WEBER, C.W., ED. + KMASMOSKT, T., ED. 
NUCLEAR DIV. K-163? 067704 
14 SEPT 84 70» C. $v UNION CARBIDE CORP, 
19 FEB 65 (REY.) NASA-CR-34377 CRO 
PROGRESS REPORT FOR THE PERIODO ~ 1 JULY 
K-16024 062795. 1964 - 15 MAR 1965, PART 1 - PREPARATION OF 
UNION CARBIOE CORP. HIGH PURITY URANIUM OXIDE POWDERS. FART 2 - 
K- i624 UN CLADOING AND JOINING OF TUNGSTEN CERMETS BY 


ANALYSIS OF THE USE OF SOLUBLE NEUTRON 
ABSORBERS IN DIFFUSION PLANT EQUIPMENT 
BAILEY, J.C, + ET AL 

NUCLEAR DIV. 

16 OCC 64 25P C-1-P 

IATa PA. NEUTRON ABSORPTION ANALYSIS 2 
CRITICALITY STUOTES SA. BATLET, J.C, @A 
K- 1824 SA.W-7d05-ENG-26 


K-1629 
UNION CARBIDE CORP. 

¿K-1029 
MINTMUN CRITICAL CYLINDER DIAMETERS OF 
HYOROGEM MODERATED U(4.9) SYSTEMS, 
NEWLON, C.L, 
NUCLEAR BIY., OAK RIDGE, TENN. 
13 MAR 065 15P (1-8 
1A.T. RA CRITICALITE STUDIES 2B.NYOROS 
MODERATTO REACTORS 2C.URAMIUN S19TENS 
MUWLOM, C.E, dALU-1629 SA. U-P403-EMG-2 


kx -1430 


PLASMA SPRAYING. PART 3 - TUNGSTEN COATING 
OF URANIUM DIOXIDE PARTICLES. IU) 

COCHRAN, W.L, + ET AL 

NUCLEAR OTY., OAK RID4c, TENN. 


* 


29 MAR 65 74? C-3¥ 
8, 
. K-1843 878753 
UNTOM CARBIDE CORP. 
K- 1643 UN 
066281 ASYMPTOTIC COVARIANCES FOR THE MAXIMUM 
LIKELIHOOD ESTIMATORS OF THE PARAMETERS OF A 
UN NEGATIVE BINOMIAL DISTRIBUTION. 


BOWMAN, K.O. + SHENTON, L.R. 

NUCLEAR DIY. 

1 JULY 63 1508 C-1-? 

JA.T. PA,BINOMIALS 2B.PROBABILITY 2C, 
ASYMPTOTIC EXPANSION 20.SERTES JA.BOWMAN, 


En K.O., 4A.K-1643 54. 4-7T405-ENG-26 
3a. 
$ K-144T 976149 
UNION CARBIDE CORP, 
041208 NASA -CR- 34492 CRO 
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190 31-486 Fk + $y 
119.18 ATP 
 ARMOUE RUMCARCOCN FOUNDATION, CniCato 
310,70 ange 143 
PROCCEEI MEE OF THE ANNUAL COMPUTER 
APPLICATIONS BTMPORT UN. 
ARMOUR RESEARCH FOUNDATION, CHICAGO 
CHICAGO 
FOR WOLOTINSE STE LIBRARIAN, 088 
lA, T, RALCLECTROMIC CALCULATING-MACAHINES-- 
CONGREBECA PR.ELECTRONIC DATA PROCESS EC-- 
CONGREBSSES EC. COMPUTER APPLICATIONS 3YMPOS1UM 
44,510, T8. AT35 


816111 


110,78 433) 116: 
ASBOCTATION FOR COMPUTING MACHIMERY 
310,78. 443 itti 565 
PREPRINTA OF SUMMARIES OF PAPERS PRESENTED 
AT THE 16TH NATIONAL MEETING, SEPT 5 - B, 


310129 


1941. 

ASSOCIATION FOR COMPUTING MACHINERY 
MEW YORK 

N.D. ivol. C-3¥ 


1A.T, 24 ELECTRONIC CALCULATING-MACHINES-- 
CONG., ZB. PROGRAMMING (ELECTRONIC COMPUTERS) 
2C ELECTRONIC DATA PROCESSING 20.MUMERICAL 
CALCULATIONS 44.310.738 A83 1961 


310,73 495 38070095 
NONE 
310.78. 498 5865 
BUSINESS DATA PROCESSING. 
AWAD, E.M, 
ENGLEWOOD CLIFFS, N.J., PRENTICE-HALL 
1945 310P sv 


lA.T, 24.ELECTRONIC CALCULATING-NACHIMES 
2B. ELECTRONEC DATA PROCESSING 2C.PUNCHES CARO 
STSTEMS SAL, AWAD, E.N. 44.510.708 ADS 


310.78. Bil 
NONE 
510,78 811 565 

CHARLES BABBAGE AND HIS CALCULATING 

ENGINES. 

CABBAGE, CHARLES + ET AL 

NEW YORK, DOVER 

$961 400P.818, C.p 

JA T. PA.CAUCULATING-MACHINES 2B. CALCULATING 
EMGIMES JA.NORRISON, PHILIP, ED. 44.510.798 
811 


#10130 


$10.78 123 
NOME 
410,78 8f) 565 

CIPERLNENTAL CORRCLOGRAMS AND FOURIER 
TRANSFORMS, INTERNATIONAL TRACTS IN COMPUTER 
SCICNCE ANB. TECHMOLOGY AMD THEIR APPLICATIONS, 
Y. $ 

BAROCE, NP, 

WEN TORA, PLRGAMON 

KIT 1340 CA c-yu 


$10131 


L ERAR: 04-09-44 Peal 1391? 
IA. T, PA.ELECTÉONIC ANALOG CONAUTERS fB, 
COARTLOGRAME 2C,FOYRICA TRANIFORIX 10. 
INTERNATIONAL. TRACTS IN COMPUTER SCIENCE AMO 
TÉCMNOLORY AMO THEIA APPLICATIONS, Y. 3 da, 


510,78 023 


310,4 O26 403202 
ATR FORCE, ROME AIR OLVYELOPMENT CENTER 
310,748. O26 1143 


COMPUTER ORGANIZATION, PROCEEDINGS OF THE 
1962 WORKSHOP PON, By AIR FORCE, RONE 
ATR DEVEL. CENTER AND WESTINGHOUSE ELECTRIC 
CORP, AIR ARM OLY, 

BARNUN, A.A. + KNAPP, W.A, 

WASHINGTON, D.C., SPARTAN BOOKS 

1963 242P PA 

1A.T. 24A.COMPUTER ORGANIZATION 

28. SOLONON (COMPUTER. PROGRAM LANGUAGE} 
2C.ELECTRONIC DIGITAL COMPUTERS 
SA.BARNUM, A.A. SB.KNAPP, MLA. 

JC. WESTINGHOUSE ELECTRIC CORP. 

44.510.78 826 


510.78 3261 
NONE 
510.78 8261 3168 

COMPUTER TYPESETTING - EXPERIMENTS AND 
PROSPECTS. 

BARNETT, MICHAEL P. 

CAMBRIDGE, M.1.T. PRESS 

1965 245» P 
14.T. 2A.ELECTROMIC DIGITAL COMPUTERS 28. 
TYPE-SETTING 2C. PROGRAMMING (ELECTRONIC 
COMPUTERS) 44.510,78 8261 


320918 


310.78 828 810132 
NONE 
510.78 828 545 
DIGITAL COMPUTER FUNDAMENTALS. 
BARTEE, THOMAS C. 
NEW YORK, MCGRAW-HILL 
1965 342P 2C-P C-SV C-HO 


2C- VAFB 
1A.T. 2A.ELECTRONIC DIGITAL COMPUTERS 44. 
310.78 #8286 


510,78 B281 
NONE 
510.78 B28) 565 

THEORY AMO DESIGN OF DIGITAL MACHINES. 
BARTEE, THOMAS C. * ET AL 

MEW YORK, MCGRAW-HILL 

1962 324P.818. — 2C-P C-S¥ C-KO 
1À.T. 24.ELECTRONIC DIGITAL COMPUTERS 2B. 
SWITCHING THEORY SA.LEQOW, IRWIN L, SB.REED, 
TRYING $. 44.510,78. 1201 


810135 


310.78 894 
RONE 
549,70 834 364 


004202 


Fra. 8. Book call numbers 
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Fig, 9. Microfilm catalog installation 


The computer-produced microfilm catalog provides 
bonuses that are immediately attractive to librarians. 
Perhaps the chief bonus is a separate catalog with reader 
for the library’s technical services staff. This point is the 


more important at LMSC because its TIC has separate 


staffs for reports and for book acquisitions and cata- 
loging. Having the complete catalog in their own work 
stations obviously permits the technical services librar- 
ians to function more efficiently. 
imilar bonuses that have equal beneficial effects on 
efheieney of library operations are the separate catalog/ 
reader installations for the literature search corps and 
thé reports and books circulation desks at one of the 
TIOS two libraries; namely, the one that serves approxi- 
mately 11,051 scientists, engineers, and administrative 
support personnel. The needs of literature search are self- 
evident, and the volume of loan requests handled by 
* two service desks as well as their distance from the 
publie catalog installation made it operationally and 
economically feasible to install these catalogs. 
dditional bonuses generated by this computer-pro- 
duced microfilm library catalog system are: (1) authority 
is of sources; (2) authority list of subject headings; 
d (3) list of open-entry items. While these lists are 
products of the system's specifications, they can be 
identified as bonuses in relation to their nonexistence 
der the displaced system. There is no need to stress 
e operational importance of the first two lists (Figs. 10 
idi 11), especially since they are automatieally updated 
tq reflect professional decisions of deletion and addition 
as well as of augmentation by integrative cross-referenc- 
ing. Operational utility is enhanced by printing sufficient 
copies to supply a set to each cataloger and a set for the 
literature search corps. The third list (Fig. 12) identifies 
the TIC’s holdings of cataloged “serial” titles. Included 
in this concept are reports generated on a specific con- 


tract, project, task, or other effort which are uniquely 
identifiable by the same report number in extension, eg., 
LMSC 1481-2, LMSC 1481-3, etc. Such report titles are 
listed but once in the official microfilm catalog and then 
with the notation: “See librarian for holdings.” The 
librarian consults her open-entry list and satisfies the 
requestor as to holdings. 

The most far-reaching spin-off of the computer-pro- 
duced microfilm library catalog system, however, is its 
power to deliver a printed book catalog at exceptionally 
low costs. The savings reside chiefly in the absence of 
photographic expenses and of press set-up costs. The 
library administrator, whose clientele would object to 
using a microfilm catalog, could use the computer- 
produced microfilm system as the printing base for his 
book catalog since it cuts printing costs by two-thirds 
(Table 1). 


Tate 1. Comparative printing costs of library book catalog 

processed, A, from microfilm master (using Copyfio process) 

and, B, from computer print-out (using IEK process) —based 
on 1,800 pages printed head-to-head, simple binding, 





each volume 300 pages 
Operation Á B 
Copyflo Multilith Printing 

20 Copies 1 Copy 20 Copies 1 Copy 
Plates $136.80 — $ 642.00 $ 642.60 
Press Set Up em — 343.80 343.80 
Bond Paper — $73.80 — -— 
Press Run 216.00 — — — 
Impressions — — 180.00 9.00 
Collation 36.00 3.60 36.00 3.60 
Binding 20.80 4.16 20.80 2.08 
Total 3409.60 $81.56 $1223.20 $1001.08 





— — 
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TDR 32-408 SUNNYVALE SDURCE HEADINGS ' 26-12-66 PAGE 64 


HEADINGS COUNTS HEADINGS COUNTS 
WESTON INSTRUMENTS, INC. I! L WRIGHT AIR DEVELOPMENT DIVe 
WHEELER. LABORATORIES 1 NRIGHT AIR DEVELOPMENT DEVe 1 
WHEELER. LABSe, INC. i WRIGHT. AIR DEVELOPMENT ‘DIV< 119 
WHIRLPOOL CORP. 5 SEE ALSO 
W N Sy INC. i WRIGHT AIR DEVELOPMENT CENTER 


WHITE SANDS MISSILE RANGE 7 WRIGHT AIR DEVLOPMENT CENTER 1 
WHITE SANDS PROVING GROUND 12 WRIGHT DEVELOPMENT CENTER 1 
WHITE-RODGERS CO. 1 WRIGHT DEVELOPMENT DIY. 1 
WHITTAKER CONTROLS 2 WRIGLEY, WALTER i 
WHITTAKER CORP. 8 WYANDOTTE CHEMICAL CORP. I 

" . D 1 WYANDOTTE CHEMICALS CORP. 26 
WICHITA, UNIVERSITY OF 6 WYETH LABS. 1 
WILEY ELECTRONICS CO. 1 WYLE LABS. — 
WILK DLLEGE 1 WYMAN-GORDON CO. 
WILLIAM MARSH RICE UNIVERSITY 1 WYOMING, UNIVERSITY OF 


SEE XEROX 1 
UNIVERSITY XEROX CORP. 
WILLIAMS, (CLYDE) AND CO. 1 YALE UNIV. 
WILLIAMS(CLYDE) AND CD. i YALE UNIVERSITY 22. 
WILLIAMSON DEVELOPMENT COs, INC. 1 SEE ALSO 
WILMOT CASTLE CO. 1 YALE UNIVERSITY OBSERVATORY 
WILMOTTE, RAYMOND May INC. 1 YALE UNIVERSITY OBSERVATORY 6 


$, YU ka y pa 


+ 


WILSON, NUTTALL» RAIMOND ENGINEERS, INC. 1 SEE ALSO 

WINDSCALE LABS. 1 YALE UNIVERSITY 

WINIFRED MASTERSON BURKE RELIEF FOUNDATION 1. YALE UNIVERSITY Qs lg 

WISCONSIN UNIVERSITY OF 1 YARDNEY ELECTRIC CORP; 7 

WISCONSIN» UNIV. OF 92 YARSLEY RESEARCH., LABS? , LTD, 1 

WISCONSIN, UNIVERSITY l YERKES LABS, OF PRIMATE BIOLOGY, INC. 1 

WISCONSIN; UNIVERSITY OF T3 YERKES OBSERVATORY 1 

WOLF RESEARCH AND DEVELOPMENT CORP. 2 YOUNG DEV. LABS., INC? 1 

WOODS HDLE OCEANDGRAPHIC INST. 9 YOUNG DEVELOPMENT LAB»» INC. : 1 

WOODS ‘HOLE OCEANOGRAPHIC INSTITUTION 62 YOUNG DEVELOPMENT LABS. INC. 3 

WDODS HOLE OCEARNOGRAPHIC INSTITUTION l YUBA CONSOLIDATED INDUSTRIES, INC. 1 
_ WORCESTER FOUNDATION FOR EXPERIMENTAL BIOLOGY — _ 2 YUMA PROVING GROUND —— h— — y 

WORCESTER POLYTECHNIC INST. 1 ZAHORSKI ENGINEERING; INC. 1 

WORK PROJECTS ADMIN. NEW YORK CITY 1 ZATOR CD. T 

WORK PROJECTS ADMINISTRATION. NEW YORK CITY 1 ZATOR "COMPANY. 1 

WORK PROJECTS ADMINISTRATION, Nefes CITY T ZENITH PLASTICS CO. 2 

WORK PROJECTS ADMINISTRATION, NEW YORK CITY 2 ZENITH RADIO CORP. 1 
WORLD DATA CENTER A, STH AEROSPACE TEST- WINE 2 2 2 — 

WORLD FEDERATION FOR MENTAL HEALTH 1 

WORLD MEDICAL .ASSOCIATION 1 

WORLD METEOROLOGICAL ORGANIZATION : 

WORTHINGTON CORP. 

WRIGHT AERONAUTICAL CORP. i 

WRIGHT. AIR DEV. CTR 

WRIGHT AIR DEV. DIV. 7 

WRIGHT. AIR DEVELOPEMNT CENTER 1 

WRIGHT AIR DEVELOPMENT CENTER .... 225 

SEE ALSO 


*** END OF REPORT su. 





Fra. 10. Source authority list 


78 American Documentation — April 1967 


| 
M 32-409 SUNNYVALE SUBJECT HEADINGS — 06-12-66 PAGE 241 












| HEADINGS COUNTS HEADINGS COUNTS 
ELELECTROSTATICS 1 ELEVONS —MUMENTS 1 
LEMENTARY PARTICLE PHYSICS 1 ELF 1 
ELEMENTARY .PARTICLES 8 SEE 
! SEE ALSD EXTREMELY LOM FREQUENCY 
“yy  PBOÜSUNS | —- dl ^. ELF PROJECT 1 
NUCLEAR PARTICLES ELF (EXTREMELY LOW FREQUENCY) 2 
PARTICLES ELGILOY 
~ECERENTARY PARTICLES—ENERGY. — — OS C CC C C — — — 
ELEMENTARY PARTICLES——MASS SPECTRA 1 CHRUMIUM-COBALT-MDLYBDENUM (CONT). 
ELEMENTARY .PARTICLES>-MATHEMATICAL ANALYSIS 1 NICKEL ALLOYS 
j —HÜME 2 ELIMINATION 1 
ELEMENTARY PARTICLES—-THEORY 2 ELINT 16 
ELEMENTS ELINT SY 
SEE ALSO ELIPC(CELECTROSTATIC LATENT IMAGE PHOTOGRAPHY) 1 
ALKALI METALS ELK RIVER POWER REACTOR 1 
ALKALINE EARTH METALS -TPSOID 
p ES SEE ALSO 
DELAY ELEMENTS BODIES OF REVOLUTION 
. 1. DBENSITY SENSITIVE ELEMENTS . .— . .—  _  ELLIPSQIDSS-AERODYNAMIC CHARACTERISTICS  _4_ 
HALOGENS ELLIPSOIDS-s-BUCKLING 2 
| HEATING ELEMENTS ELLIPSOIDS*~CAVITAT ION 2 
HUMIDITY SENSITIVE ELEMENT PSOID DAT INGS 
RARE GASES ELLIPSOIDS*-HEAT TRANSFER 1 
TEMPERATURE SENSITIVE ELEMENTS ELLIPSOIDS +-HYDRODYNAMIC CHARACTERISTICS 1 
TRANSITION METALS PSOTDS -MAGN PROPER 
TRANSPLUTONIC ELEMENTS ELLIPSOIDS--MATHEMATICAL -ANALYSIS 2 
TRANSURANIC ELEMENTS ELLIPSOIDS--PRESSURE DISTRIBUTION 1 
.ELEMENTS--ABSORPT[VE PROPERTIES. . CELI PSOTDS=~PRESSURE EFFECTS == 11 e—a — 
LEMENTS-——PURIFICATION | 1 ELLIPSOIDS--STRESSES 1 
LEMENTS-—RADIATION EFFECTS 1 ELLIPSOIDS4-SUPERSDNIC CHARACTERISTICS 1 
ELEMENTS-—SYNTHESIS LIPSOIDS $ . 
ELEMEÉENTS——THERMDDYNAMIC PROPERTIES : 1 ELLIPTIC DIFFERENTIAL EQUATIONS 4 
ELEMENTS-—WAVE TRANSMISSION 1 SEE ALSO 


š 


ELETROMAGNET IC PUMPS 1 





LEVATORS{ AERIAL) 1 ELLIPTIC EQUATIONS 1 
SEE ALSO ELUIPTIC FUNCTIONS 2 
_i CONTROL SURFACES — —  — .  . — . — . — ELLIPTIC MAPPING 0 — L 
ELEVONS SEE 
ELEVATORS {AERIAL )-— ANALYSIS 1 COMPLEX VARIABLES 
ELEVATORS(AERIA F 
LEVATORSCAERIAL ) -—FAILURE 1 ELLIPTIC SYSTEMS 1 
“ELEVATORS (AERIAL )—FLUTTER 1 SEE 
(ELEVATORS (AERT AL Š : E 
ELEVONS 1 ELLIPTICAL ‘ORBITAL TRAJECTORIES--TASLES 1 
SEE ALSO : ELLIPTDCYTOSIS 1. 
Al 
| CONTROL SURFACES POLYCYTHEMIA 
ELEVATORSCAERIAL ) ELLOPSOIDS 1 
[ELE ONS--DEF 
ELEVONS-—EFFECTIVENESS 1 ELSEVIER MONOGRAPHS +. CHEMISTRY SECTION¿ 4 1 


Fig. 11. Subject authority list 
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TOR 32-410 OPEN ENTRY REPORT 01-03-66 PAGE 15 
— LOC. REPORT NUMBER — — ACCESS CODE HOLDING INFORMATION 





1 LMSD-378 210- 038273 0105 TITLE VARIES. SERIES CONTINUES UNDER A 
M QM ILC NA O110- OIFFERENT TITLE. 
1 LM$0-380 111- 038274 0115 ~9 25 FEB 61-3 MAR 61 NCN 
- D) 
0125 -11 11 MAR 61-17 MAR 61 NCN 
0130 -12 (NOT ISSUED) — 
oe A A A ss 0135 -13 [NOT RECEIVED) ~ 
0140 -14 1 APR 61-7 APR 61 NCN 
0145 -15 8 APR 61-14 APR 61 NCN 
- v 
0155 -17 (NOT RECEIVED) 
neo "TE 0140 -18 I 29 APR 61-5 MAY 51 NCN — 
— KE 65 -19 6 MAY 61-12 MAY 61 NCN 
0170 -20 13 MAY 61-19 MAY 62 NCN 
m 0175 -21 20 MAY 61-28 MAY 61 N 
— 0180  -22 2T MAY 81-2 JUNE 61 NCN : 
0185 -23 3 JUNE 61-9 JUNE 61 NCN 
m vs = 0190 -24 10 JUNE 61-16 JUNE 6 — 
ARA 0195 -25 (NOT RECEIVED) 
0200 -26 - 2 JUNE 61-30 JUNE 61 NCN 
0205 -27 ULY 61-7 Y 6 NGM 
210 -28 8 JULY 61-16 JULY 61 NCN 
0215 ~29 15 JULY 61-21 JULY 61 NCN 
0220 -30 22 JULY 61-28 
0225 -31 29 JULY 61-4 AUG 61 NCN 
0230 -32 5 AUG 61-11 AUG 61 NCN 
1 LMSD-380 111- 039279 0000 -1 31 DEC 60-6 JAN 61 NCN 
5 = 7 JAN 61-13 JAN 61 NCN 


— — — — — — M — M — — — — — — — 


0010 -3 (NOT ISSUED) 
0015 -4 {NOT ISSUED 
0025 -6 4 FEB 61-10 FES 61 NCN 
0030 -7 (NOT ISSUED) 


AM — — —— — — —— —— — 


8 
——++— —— —- M o —.. O VA @ e nr — a.L, —— m" v. . — — 


NOT ISSUEO) . 
0040; TITLE VARIES. SERIES CONTINUES UNDER A 


a: ö— — — Fr T 
m — — —— —— — oo 


1 LMS$D-423 000- 038827 0000 -1 24 APR 59 C~E-49 
0005 -2 17 JUNE 59 (SUPERSEDES LMSD-423 000-11 


209 — — — = 
0015 -3 24 SEPT 59 (SUPERSEDES LMS0—-423 000-2) 
0020 C-E-99 
1 LM$0-436 000- 037999 0000 -6 (SEE LMSD 429 253) 
0005 -7 (SEE LNSD 445 213) 
o a E S. IME LNSD S49 Y xu == 


Fra. 12. Open-entry list 
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Coding and Tabulating Machine Processing of Physical Signs in 


Toxicity Tests 


A system utilizing relatively simple machine methods is 
described for processing physical signs occurring in 
subacute and chronic toxicity tests in dogs and rats. 
Signs are coded by organ system and specific sign 
within organ systems. These codes are then entered on 
IBM cards. Signs that are not included in the code 


— —— 


* Introduction 


ally observations of the clinical condition, physical 
appearance, and behavior of animals used in subacute 
and: chronie toxicity studies of compounds are made 
over periods extending from a few weeks to many months. 
er recording of these observations results im the ac- 
cumulation of a considerable amount of data which must 
be analyzed and summarized in a form suitable for 
pregentation as a final report. Hand processing these 
data has been a formidable task, since the number of 
animals may vary from a few dogs and rats in a sub- 
acute experiment to as many as 32 dogs and 200, or more, 
rats in a chronic experiment. The present report de- 
serjbes a system utilizing relatively simple machine 
equipment to process the data so that the occurrence of 
physical signs in rats and dogs may be more effectively 
analyzed and evaluated. 


° Code Notations 


An alphanumeric code for physical signs has been de- 
vided so that organ systems are designated by a number, 
and specific signs, referable to that organ system, are 


f Present address: Food & Drug Research Laboratories, Inc., 
Maspeth, New York. 


notation are entered in natural language. At the termi- 
nation of the experiment, signs are printed out by sign 
and dosage group or by animal and dosage group. 
The system has simplified the analysis of the experi- 
ments and aided in evaluating the dose relationship of 
signs. 


R. W. REICHARDT, S. P. SHER, R. FORD, R. B. ANDER- 
SON, and E. E. VOGIN f 


Merck Institute for Therapeutic Research and 
Merck Sharp & Dohme Research Laboratories 
West Point, Pennsylvania 


designated by a letter. Thus, 6 refers to the gastroin- 
testinal tract and 6F and 6G signify tarry stools and 
emesis, respectively. A second digit refers to the number 
of days per week a particular sign occurred; thus, 6F2 
signifies that tarry stools occurred on 2 days. The codes 
are similar for rats and dogs, but more positions have 
been designated for dogs because a greater variety of 
signs can be discerned in this species. The complete code 
for dogs in shown in Fig. 1. Within each organ system a 
“Z” designation exists for signs that are not otherwise 
specified in the code. A separate registry of "Z" codes in 
natural language is maintained. If a particular “Z” 
designation becomes significant, it is assigned an alpha- 
numeric code. 

The animals are individually housed and observed at 
frequent intervals each day. The physical signs and be- 
havior of each animal are recorded daily in an official 
laboratory book. Animals exhibiting no overt signs are 
designated as “appears normal”; and in those studies in 
which a drug is administered at specified intervals (once 
daily, twice daily, etc.), signs are recorded before and 
after drug administration. Signs are coded at weekly 
intervals and transmitted to data processing. 

The laboratory sheets are designed for an ordered 
system. Therefore, these sheets are arranged sequen- 
tially by animal number and are maintained in that order 
throughout the study. Similarly, IBM cards are kept 
in the same order. 
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1 - CNS 2 - Eye/Ear 


A - Fine Tremor A - Ptosis 
B - Coarse Tremor B - Miosis 
+ C - Behavioral Changes C - Mydriasis 
D - Catatonia D - Lacrimatíon 
E - Increased Activity E - Relaxation of nicti- 
F - Decreased Activity, Sedation, etc. tating membrane. 
G - Convulsions F - Scleral Injectíon 
H - Ataxía G - Conjunctivitis 
I - Loss of Righting Reflex H - Loss of Hearing 
J - Analgesia * Z - NOS 
* Z - NOS 
3 - Skin 4 - Respiratory 
À - Sores A - Nasal Discharge 
B - Masses B - Cyanosis 
C - Piloerection C - Decreased Respiratory Rate 
D - Loose Hair D - Increased Respiratory Rate 
E - Alopecia E - Dyspnea 
F - Other Skín Changes F - Panting 
* Z - NOS G - Dry Nose and Gums 
H - Foul Odor of Breath 
I - Rales 
* Z - NOS 
5 - Cardiovascular/Renal 6 - Gastrointestinal 
A - Hyperemia, Ears and Gums À - Abdominal Distension 
B, - Blanched Ears and Gums B - Abdominal Tenderness 
C - Edema C - Soft Stool 
D - Hematuria D - Diarrhea 
* Z - NOS E - Frank Blood in Stools 
F - Tarry Stools 
G - Emesis 
H -Emesis - Blood 
I - Tenesmus 
J - Ptyalism 
K - Frequent Swallowing 
L - Anorexia 
* Z - NOS 
7 - Reproductive 8 - Musculo-Skeletal 
À - Increased Mammary Size À - Muscle Spasm 
B - Lactation B - Muscle Tone Increased 
C - Increased External Genitalia C - Muscle Tone Decreased 
D - Decreased External Genitalia D - Prostration 
E - Priapism ^. * Z - NOS 
F - Hyperemia, External Genitalia 
G - Vaginal Discharge - Clear O - General States 
He Neenah: Digcharge = Bloody A - Poor Physical Condition 
I ~ Estrus i 
* Z - NOS B - Interim Sacrifice 
C - Found Dead 
D - Sacrificed Moribund 
| E - Death Accidental 
+ = Write in sign. F - Escaped 
* 2 - NOS 


* = Not otherwise specified. 


Fig. 1. Dog sign code 
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° e Layout and Program 
N 

rtain identifying information is n on each 
E card: 

TT i. (Columns 3-8) Identifies the particular 
Ne ty test, e.g., 65-0086 is the eighty-sixth subacute or 
chrohic study conducted during 1965. Numbers are 
ed sequentially. 

.| Dose Code. (Columns 10-11) A two-digit designa- 
dicates the dosage level that permits sequencing of 
according to dosage; e.g.: 





00 = control 
10=low dose 
20— middle dose 
30= high dose 


If|& new dosage level is added, it may be coded in the 
proper dosage relation; 15=dosage level between the 
low and middle dosage groups. The actual dosage utilized 
is recorded in the protocol for that experiment. 

3. Animal Number and Sex. (Columns 13, 17, and 19) 

4.| Sign Identity. (Columns 68 and 69) A number and 
letter that designate the.organ system and specific sign. 

5. Set Identification. Each card contains a specified 
number of weeks of data. In rat studies, & set covers 
9 weeks and in dogs, 6 weeks. 


. ogram 


See Fig. 2. 
Observations are recorded daily on an official labora- 
sheet in natural language. At weekly intervals, the 
observations are coded, and a Xerox copy of the coded 
information is forwarded to data processing. Punched 
cards are prepared using an 026 Printing Punch. One 
card is prepared for each sign. The sign is punched into 
the appropriate field for the current week. The cards are 
then sequenced: first by dosage group, next by animal 
number, and finally by sign. 
The déck of newly prepared data is then collated 
the file deck using a routine of “merge if 





1. Nonmatching cards nen te the first appear- 
ance of the sign in a given animal. 


2. Nonmatching cards representing the absence of a 


previously observed sign. 

3. Matching cards representing signs that have been 
previously observed. The new card is filed in front of 
the Parlier observation. 

e matching cards are posted using an "alternate 
routine" with an 026 Printing Punch. The 
y card is passed to the reading station, and the 






are removed from the deck using a 101 Statistical 
Machine (or a collator) and then discarded. The newly 


posted cards are merged with the nonmatching cards (1 
and 2 above), and the file containing the new data is 
then back in original sequence. 


* Use of Sign Data in Analysis of Drug Results 


Át the conclusion of the toxicity test or at an interim 
period, a printout of all signs is furnished. Two types of 
printouts are prepared: 

1. Sequenced by sign code and dosage group, sex, and 
animal number respectively. 

2. Sequenced by dosage group, sex, animal number, 

and sign respectively. 
The use of the first printout (grouping of signs by 
dosage group), permits an evaluation of the relationship 
of sign frequency to dosage group. Thus, it is possible 
to quickly detect signs scattered through both control 
and drug-treated groups which are probably unrelated 
to treatment. In. addition, this printout permits the 
investigator to detect any apparent dose relationship in 
the occurrence of any particular sign. In Fig. 3, the order 
of occurrence of sign 6L (anorexia) is dose code 30 dose 
code 20 dose code 10 dose code 40 dose code 00. The 
onset and duration of any given signs can be easily deter- 
mined. 

The second printout, (Fig. 4), which lists all signs 
occurring in a given animal, permits the investigator to 
evaluate the physical condition and behavior of that 
animal. Furthermore, it is possible to follow either the 
progression or regression of the severity of treatment 
within a given animal and to note the possible relation- 


ship between signs observed. 


In addition to the sign code we are also entering rat 


‘body weight, dog body weight, food consumption, water 


intake, and urine output on IBM cards. Dog water, 
urine, and food data are entered daily, and weekly sum- 
maries are prepared. Since body weights are recorded 
once weekly, the single entry is, in effect, a weekly sum- 
mary. The use of machine methods for this data has ` 
substantially reduced the amount of hand copying, as 
well as typing of final tables. Plans are under way to 
include machine processing of pathologic data. 
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(ORIGINAL) 


NOTE. 
n 529 BOOK 
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026 PUNCH 
SIGNS 


. 


SIGN 
CARD 
ONE PER CARD 


(BY DOSE pM 
O ANIMAL A SIGN) MASTER FILE 
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MASTER FILE IN 
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Fra. 2. Flow diagram of machine processing of sign data in toxicity tests 
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Fig. 3. Physical signs in dogs listed by signs and dosage group 


30 651:7T9 F 
putation to the Reduction of Data Generated During 


Techniques in Toxicology, 
cording and Electronic Processing of Chronic Rat Tox- 


cation of Automatic Data Acquisition and Digital Com- 


| Pharmacology, 8:349 (1966). 
SwaLL, R. M. and R. C. ANDERSON, Semi-automatic Re- 


Russa, T. J, W. G. Wacconer, and E. B. Gasser, Appli- 


WEEKS OF DRUG ADMINISTRATION 


NO 


DOSE ANIMAL 


TT# CODE 


650086 











L. = = — > ZZ LL UL K. LL L... y uli (u. 


mm | om; | e de a ca o O: < 
o ° o | RO | | QQ + ° O ° + + 
| | o 
| Hoc o Hoo A Q iO u) 1 tQ *+ x a | 
| 5 pe = C e o T y T co 
| ii MM x Q `+ + x + | + 
l ) ° 
| š 
| mid? H Ln Le) t^ HO] fu | 
Ni 5 dae DNE EX a C = LO; N LL | 
| | + wt, 00 p Q + wo a+} O Q | 
1 i | 
| | | a |j Mp oo! PES | : 
| E D QN e. Oo | | ` 
' | |! 1 ° a | xt NO i 
i ! : H ; ' I 
* | | $ i 
| H Q d Na x QU d - tO u) 
| | i | E M lu i LOLN c5 = «a 
! Q i Y Old WO ANG: Q + wo 00H 
i | | | | | 
| Add: QJ y i m R T + | Qi D | | 
060 — ù. ie. 1 o! u i | 
| 0 001010 o j y Qd > | NW. m vo e x 
| ; ` | | 
a mobod08.10mdu rONO OU GEN Kio uu KOO MEMS mon LO WN thi «ok 
Oo so AS AMNEM O cd HQ« e 
; | : | 


Q AH — CQ Q Q t eL tQ to tn tn tn m t9 09 O Q O O tn to m to to th to t3 tn ta ti m O 0 10 O t= i> = Q Q Q QQ Q m tn mh ta to NARA 
t= O O. O, eb e rH Q Q A C- = C9 C- r= t= t= Ió CERTE C co 0. 0 0 0 00 90 O, D, O ON OO0oO000 Q Q Q Q Qo Q Ett 
v gd cde NGN G NO N q Ti ed tH d He sd edd eoo) ede dod nnd nd end nd in sod A Ü NA NOUN QQ G e à n 
(0 40 00 10 0010 10 UD 00 u (D 00 10 00 00 10 0 0 LO 00 0 0 1 0 O 000 00 0 O 00 00 00 00 10 00 10 10 10 00 02 10 I 1010 00 I0 00 0) ID 0 0 IN 10 NO 
CeO COO S S ` NND ee ee S TE XO XO 30 SO XO NO SO NO 10 
i | | | 
oooooooooooooooooooopooooooooobtooooooooooooooooo ooboopooo 
OO OO0O000000 d dd ri d dri dd Hdd ddd Hdd ddd ddd Add dd ddd Ad vd c FH n e ARUN 


osa ooo osos 


Fia. 4. Physical signs in dogs listed by dosage group and animal number 


American Documentation — April 1967 


S6 


i 


| 


——— 





f 


Subject Searching with Science Citation Index: Preparation of a 
Drug Bibliography Using Chemical Abstracts, Index Medicus, and 


Science Citation Index 1961 and 1964* 


A bibliography on the drug thalidomide was prepared 
through a search of Chemical Abstracts (CA) and 
Index Medicus (IM) for the years 1956-1964, which 
took 14.6 hours. This was compared with a similar 
‘bibliography prepared through a search of Science 
Citation Index (SCI) 1961 and 1964, carried out for 
an equal length of time according to search procedures 
described, in an effort to determine if and how SCI 
might be helpful in subject searches. A satisfactory 
procedure for manual search of SC] was developed. 
The| cumulative number of references found through 
SCIi plotted against time gave a convex curve, while 
corresponding data from the conventional indexes 
gave a linear response. For periods up to eight hours, 
SCll yielded more references than did either IM or CA. 
However, at 14.6 hours SCI did not produce all refer- 





| 
| 

. ——— 
f 


| 

The work reported here is the result of a proposal to 
compare use characteristics, i.e., the time needed to search 
` andi the result of search, of 


Science Citation Index (SCI) 1961 and 1964 
ji with 
hemical Abstracts (CA) and Index Medicus (IM) 
for — & bibliography on & partieular drug. Phar- 
maceuticals was chosen as the field to be represented be- 
cause it was considered a good example of a multidisci- 
plinary field. Thalidomide was chosen as the topic drug 
because its properties were well publicized, and appear- 
ancb of references to it were timely from the standpoint 
i 
i 


* This paper is based on an M.B. thesis submitted to Drexel Institute 
of Technology, October 1966. 


ences obtainable through CA-IM. Each of the three 
indexes produced a high percentage of unique refer- 
ences, SCI's being the highest. SCI-1964's coverage 
of articles published in 1964 was more complete than 
was IM-1964's. CA-IM gave superior coverage of 
chemical papers, patents, and papers in the less com- 
mon languages. SCI's coverage of pharmacological 
papers was superior to that of either CA or JM. In 
this search SCI and conventional indexes could be 
profitably used together; each produced a large num- 
ber of references not to be found in the other. In this 
search SCI was not appreciably less efficient in retriev- 
ing drug references than were CÀ and IM; for short 
time intervals, if was more efflcient. More general con- 
clusions must await further investigation. 


CAROL C. SPENCER 


Institute for Advancement of Medical Communication 
Philadelphia, Pennsylvania 


of coverage by SCI. The time period chosen for search- 
ing was from 1956 (the date of the first publication con- 
cerning the drug) through 1964 (the last year for which 
complete indexes for all three services were available at 
the outset of this project). The experiment was designed 
so that time to be spent searching SCI was determined by 
the time required to complete a thorough CA-IM con- 
ventional subject search for references to thalidomide. 
By entering the conventional indexes (CA and IM) with 
a chemical or generic name, one could expect optimum 
results from them; that is, there were no problems of 
ambiguity either in designating a name for what was 
wanted.or in determining how it would have been in- 
dexed. 

Some of the questions explored are: (1) How does 
searching SCI compare (in efficiency and output volume) 
with searching CA-IM for the same topic? (2) Can SCI 
produce references not found by CA or IM? (3) Is there 
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any appreciable difference in the outputs? If so, what 
are the reasons? (4) What is the nature of any unique 
output? (5) Can SCI and conventional indexes be com- 


bined in some way to produce resulte superior to either 


type alone? 


RELATED Work 


Garfield (1) has suggested using citation indexes for 
subject searching, but some think that the value of using 
them this way is yet to be proven. Martyn (£) states 
without qualification that, as a retrieval tool, Science 
Citation Index (8) is “not as efficient” as the more con- 
ventional indexes, however well it may function as an 
access tool. o 

Touloukian (4) argues for citations, but not neces- 
sarily for citation indexes, when he suggests using ab- 
stracting journals to locate recent papers, then using 
their bibliographies to trace additional papers. He has 
found this to be more efficient than searching the ab- 
stracting journals throughout the entire time period. 

Waldhart (5), who compiled a bibliography on lasers 
using seven conventional indexes and SCJ-1961, found 
that in both SCT and conventional indexes he could find 
references at the rate of three minutes per reference. He 
did not use the productive technique of “cycling” (using 
the bibliography of the starting reference to provide ad- 
ditional access points) but began with a single reference 
as his entry into SCJ-1961 and collected references that 
cited it. Cycling might have lowered his production time 
below three minutes per reference but would have re- 
quired access to the original journals. 

Garfield, Sher, and Torpie (6) used the bibliographies 
of their primary references on DNA in order to com- 
plete the map of the citation network, but their concern 
was with the relationship between articles, rather than 
the accumulation of a large number of additional ref- 
erences in a limited time. 

Baker (7) used Index Chemicus as a- source of primary 
references on alkaloids and then obtained additional 
references by feeding the primary references into SCI- 
1964. She did not use the cycling technique, and her con- 
cern was with the nature of the output rather than with 
the time required to produce it. 


GENERAL CONSIDERATIONS 


The main. difficulties i in using SCI as a subject edes 
appear to be: 

1. Unfamiliar format. Access to SCI is gained by 
means of a starting reference, i.e, a reference the 
searcher already knows, rather than by means of a sub- 
ject heading, as in conventional indexes. The starting 
reference must have the author’s name spelled correctly 
in most instances and may require at least his initials 
to distinguish his works from those of other authors with 
the same surname; in SCI authors are listed alphabeti- 
cally. Following each ctted author is a list of his papers. 
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After each of his papers is a list of other papers that 
cite it. In another section, the Source Index, the etting 
authors are listed alphabetically, together with the cita- 
tions and titles of their papers, the type of article (review, 
editorial, etc.), and the number of references that the 
-particular paper cites. The format is well illustrated and 
explained in the Institute for Scientific Information’s 
training publication (8). After a bit of practice with 
this manual and with the SCI itself, the unfamiliar for- 
mat was no problem. 

2. No direct subject approach. The lack of subject 
approach can be compensated for by consulting appro- 
priate conventional indexes to obtain early landmark 
papers or recent: reviews, if the searcher does not know 
the name of an author in the field. 

3. Noise, irrelevant references. A hypothetical start- 
ing reference A may be cited, for example, by papers 
1-10. Perhaps only 2 of these 10 actually deal with the 
aspect of paper A in which the searcher is interested; the 
others may refer to another aspect, or to a technique 
developed by the-author of A which the citing author 
has applied in another subject area. If this should hap- 
pen repeatedly, the searcher may accumulate a large 
number of references irrelevant for his purposes. Careful 
selection of a homogeneous starting reference (one 
which deals with only one subject) can minimize this 
problem. The Source Index to SCI will contain the full 
titles of papers 1-10 and will thereby give some indica- 
tion of their probable relevance. Garfield (1) suggests 
that if noise 1s & problem with SCI, it is possible for the 
searcher to record only references that have cited two or 
more starting references, thus increasing the chance of 
relevance. In planning a procedure for using SCI most 
efficiently, one must block off or defer the questionably 
relevant leads (such as papers whose titles do not indi- 
cate positive relevance) at least until the more productive 
approaches (such as papers whose titles do indicate 
positive relevance) have been exhausted. 

4. Excessive time consumption. The average working 
literature searcher or reference librarian must be very 
much aware of time; for routine searches he is unlikely 
to use an index that is excessively time consuming, how- 
ever great its other advantages. Excessive search time ` 
may be.a result of the noise factor and also of subopti- 
mum search strategy. The various steps or operations 
within the search. process must be correctly evaluated 
for their relative productivity and —— for their use 
assigned accordingly. 

The advantages of using SCI for — — 
would seem to be: 

1. No terminology problem. If the lack of — 
‘approach is a disadvantage, it also has favorable aspects. 
There is no need to guess how an indexer might have 
indexed the desired material if one knows the author 
and other (even fragmentary) information about one 
or more papers dealing with it. 

2. Interdisciplinary coverage. The overlapping areas 





between classic disciplines are not reliably covered by 
the conventional indexes; therefore SCI's complete cov- 
erage of the interdisciplinary journals such as Nature 
d Sctence is a valuable asset. 

. Complete coverage of “covered” journals. Conven- 
tional indexes are necessarily selective in their coverage 
of many journals, and their criteria for selection and 
ion are by no means always obvious. Letters to the 







slighted. SCI indexes every item in — it covers, 
even errata notices. 


ethods . 


= SEARCH 


hemical Abstracts (CA) was searched on the heading 
B: thalimide, N-(2,6-dioxo-3-piperidyl)” from 1956 (the 
of the first publication on thalidomide) through 
. This yielded 110 references in 5.6 hours. 
ae Medicus (IM) was searched on the heading 
“thalidomide” from 1963 through 1964, producing 275 
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references in 6.5 hours. Since the heading “thalidomide” 
was not used before 1963, IM was also searched on the 
heading “hypnotics and sedatives” for the years 1956 
through 1962, and titles were scanned for the word 
“thalidomide” or its synonyms. (The probable incom- 
pleteness of this latter step may be estimated from the 
fact that of the 110 references from CA and the 275 
from JM-1963-64, 70% could be judged relevant from 
the titles alone.) The search of IM produced a total of 
370 references in 9 hours. 


Searcy Time "T" Hours 


Since the time required to search both CA and IM 
from 1956-1964 (resulting in 429 different references) 
was 14.6 hours, according to the design of the experiment, 
that amount of time was taken as a unit for measuring 
the time used for the SCI search: 7T'—14.6 hours. (In all 
cases, search time included copying time.) 


| SEARCH OF “SCIENCE CITATION IwpEx" (SCI) 


See Figure 1 for details about the basic procedure used. 
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Fic. 1. Flow diagram for SC] search procedure 
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Criterion for relevance: In order to avoid subjective 
judgments, an article was considered relevant if its full 
text contained any mention of thalidomide or its 


synonyms. 
First Run (hereafter designated as SCI-1) 


Chosen as the starting reference was the first signifi- 

cant publication on thalidomide listed in CA: 

Kunz, W., N-phthalylglutamie acid imide: Experimen- 
tal studies on a new synthetic product with sedative 
properties, Arznetmittelforschung 6 (8) :426-430 
(1956) 

(Other relevant references were on hand in case this 
reference failed to be cited.) This starting reference was 
used as access into SCT 1961 and 1964. ‘(SCI’s for 1962 
and 1963 have not been published.) 

. Step 1. All papers that cited this starting reference 
were recorded. The references (author, journal, page, 
and year) were looked up in the Source Index of SCT to 
locate titles for the articles, to assess their possible rele- 
vance. When positive relevance was determined from 
the title, each relevant descendant (paper that cited the 
starting reference) became a candidate for access into 
SCT as a new starting reference, and so on, until no more 
references resulted. If positive relevance could not be 
determined from the title, the reference was placed in 
a tentative discard (Q-file) unless the reference was a 
review article (as indicated in the Source Index) or had 
appeared for the second time. If a reference was: 


1. A review article, it was not put in tentative dis- 
card, even if the title did not indicate relevance. 
All reviews were immediately consulted in the 
original to determine relevance. 

2. Á candidate for tentative discard for the second 
time, that is, if it was a descendant of two relevant 
articles and therefore much more likely to be 
relevant than a paper descended from only one 


relevant reference, its original was immediately - 


consulted. 


Members of these two classes of articles, if relevant, were- 


immediately subjected to Steps 1 and 2. Review articles 
were unusually productive of additional references and 
were processed out of the usual order so as to obtain as 
many references as quickly as possible. At the point when 
no more references could be obtained from these papers 
citing the starting reference, the accumulation contained 
the original starting reference and a number of its 
descendants. 

Step 2. All of the relevant references so far accumu- 
lated by searching SCI were consulted in the journal in 
which they originally appeared. Their bibliographies 
were scanned, and all ancestor references (papers which 
these papers cited) which appeared relevant (either 
from their titles or from statements about them in the 
text of the article) were selected: and recorded. These 
ancestor references were also candidates. for access to 
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SCI; their use as new starting references begins the 
process of “cycling.” Step 1 was repeated on the refer- 
ences located through Step 2 until no more references 
could be obtained. 

Step 3. When no more references could be obtained 
from iterating through Steps 1 and 2, those references 
that had accumulated in the tentative discard file were 
checked in the journal in which they originally appeared 
to determine their relevance. If relevant, they were also 
used to access SCT as above. 

Step 4. If time permitted, the most productive authors 
were looked up in both the Citation Index and the Source 
Index in hope of finding more relevant references. 

All this was done for 14.6 hours, time equal to that 
spent on the conventional-index search. Elapsed time 
and the number of relevant references obtained were 
recorded at the end of each work period. 


Addttional Runs 


In order to try other ways of using SCI, a second and 
third run were planned in order to see if a larger output 
could be obtained by using inputs other than one starting 
reference, while using the same basic procedure. 


Second Run (hereafter designated as SCI-2) 


This was the same as the First Run except for the 
initial input. For SCI-2 the input was the review articles 
obtained in a quick search of CA-1962-64 (for reviews 
only, as classified by CA) and their bibliographies. 


Third Run (hereafter designated as SCI-3) 


This was the same as the First Run except for the 
initial input. For SCI-3 the input was the entire product 
of the CA-IM search, 429 references. They were proc- 
essed in alphabetical order. Descendants of these starting 
references (from Steps 1 and 2) were checked first against 
the list of 429 CA-IM articles (List A), a quick, if in- 
complete way to determine relevance. If necessary, their 
titles were then checked in the Source Index, as in the 
First Run. In all other respects the procedure followed 
was the same as in the First Run. 


Compilation of Composite File From All Sources, CA, 
IM, 8CI-1, SCI-2, SCI-3 


Reference cards from all five sources were pooled, and 
a composite file containing one card per reference was 
made on marginally-punched tabulating cards (9). On 
each card was recorded, coded, and punched each of the 


- five sources through which the reference was located. 


For those articles not found through CA or IM, the 
indexes were rechecked to determine why they were not 
found: | 


O—Indexed under other heading. (Found in author 
index.) 


J—Journal not covered. 

L—IÁndexed later, after 1964. 

S—"Selective coverage.” (Journal listed as covered, 
but article not found in author index in year of 
publication nor two following years.) 


This information was recorded, coded, and punched into 
the tabulating card. 

ch article was classified as to whether it was pri- 
marily chemical, clinical, or pharmacological. It was also 
ela ified as to whether it was: 


. Case history or original study 

. Review (as classified by SCI, CA, or IM) 
. Patent 

. Editorial 

Miscellaneous 


—gn-g-e2-ta-t-— 7 


he language and country of origin was recorded, 
coded, and punched in the tabulating card for all articles. 
In TM, the original language is explicitly stated, if other 
than English. In CA, the original language is explicitly 
stated if (1) it is other than that which would be in- 
ferred from the country of origin of the journal, or (2) 
it could not be so inferred, as with Swiss journals. For 
articles from SCI runs, the original journal was consulted. 
references from SCT runs which could not be veri- 
fied|or which proved incorrect were discarded. Twenty- 


120 


e 


REFERENCES 


one items were so discarded, yielding a total composite 
file of 632 articles. 


* Results 


EFFICIENCY 


Figure 2 shows the hourly outputs of the CA, IM, 
SCI-1 and SCI-2 searches, and Fig. 3 shows the cumu- 
lative number of references vs. search time for the same 
four searches. (SCI-3, a supplemental exhaustive search, 
is not included in the efficiency comparison since a time 
limit is not appropriate to this type search.) 

It can be seen from Figs. 2 and 3 that if one chose to 
spend 8 hours or less on a thalidomide search, SCI-2 
would yield the highest number of references in that time, 
and that most of those references could be obtained in 
the first 4 hours. 

SCI-1 proved superior in efficiency to the conventional 
indexes for up to 6 hours, most of the references being 
obtained in the first 3 hours. 

Table 1 shows the average time required to obtain 
one reference by each procedure. From this it appears 
that SCI-1 and SCI-2 were not appreciably less efficient 
than conventional indexes for & search on thalidomide; 
for the short search time intervals most often encountered 
in practical situations, they were more efficient. 





HOURS 
Fic, 2. Hourly output in references for CA, IM, SCI-1, and SCI-2 searches 
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Frc. 3. Cumulative number of references vs. search time for CA, IM, SCI-1, and SCI-2 searches, Dotted lines indicate com- 
bined product of CA and IM searches. 


SCI-2 was more efficient than SCI-1 for locating refer- 
ences on thalidomide for the period of time covered in 
this experiment; in SCI-2, the review articles are proc- 
essed early and review articles are the richest source of 
new references. This effect is illustrated in Table 2, 
which shows which part of the search procedure produced 
the references in SCI-1 and SC1-2. The great productivity 
of the relevant review articles in SCI-2 emphasizes the 
value of the cycling technique (which requires examina- 
tion of the original article) and confirms the value of 
preferential treatment for possibly relevant review 
articles. The cycling technique adds to efficiency, but 
requires access to original journals, impossible in some 
situations. (However, Table 4 will indicate that quite a 


TABLE 1. Average time to obtain one reference 


Minutes/Reference (average) 


Hours CA IM  SCI-L SCI2 IM-CA 
1 31 15 0.80 0.48 
2 31 15 071 0.69 
3 3.1 15 0.86 0.82 
4 31 15 El 0.91 
5 31 15 13 1.04 
6 15 15 147 
7 15 1.63 1.32 
8 15 175 144 
9 1.5 1.8 1.6 
10 2.0 1.72 
11 2.18 1.87 
12 2.37 2.04 
13 2.55 2.20 
14 2.73 2.33 
14.6 2.85 2.43 2.03 


—— —— n _ J n — — 
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number of articles could be found using only SCI, Lancet, 
and British Medical Journal.) 


RELATIVE COMPLETENESS 


Table 3 and Figure 4 show the overlap among indexes 
and the large amount of material covered uniquely by 
each index. 

Figure 5 indicates the relative completeness of the 
searches done if the composite total of 632 references is 
taken as 100%. This graph suggests that for an 
exhaustive search, a combination of CÁ-IM and SCI-2 
would give the most complete results. However, SCI-3 
(including CA-/M) might very well yield the most com- 
plete results if time were not limited. Ten references 
were obtained only through SCI-8, and eventually all 


Taste 2. Productivity of various steps in SCI seare 


procedure i 
Point at which reference SCI-1 SCI-2 
was called relevant No. 9% No. 96 
From relevant reviews: 
Relevance indicated in text 
of review article 97 32 269 ri 


From Source Index: 
Title contained word 
"thalidomide" or synonym 31 10 30 9 


Original consulted: 
Source Index indicated paper 


was & review &rticle 11 4 11 8 
Cited twice by known 
relevant papers 88 29 5 1 


Q-file: tentative discard 75 25 33 10 


RR AA tP n... 


— 


TaBLE 3. Overlap among indexes and unique coverage for 
NEL. each index 


Percent Percent unique 


of 
Source references of 632 references 
CÁ ny — 48 48 75 
IM only 196 31.0 
SCI me 203 32.0 
(All — (447) (71.0) 
CA tnd IM 21 34 
CA and SCI 11 1.8 
IM and SCI 123 19.5 
CA, IM, and SCI 30 48 
To 632 100.0 
Total CA 110 43.6 
Total 1M 370 53.0 
Total SCI 367 ` 553 


references located by SCI-1 and SCI-2 would have been 
located by SCI-3, given enough time. 


CHARACTER OF OUTPUT 


Table 4 shows that the vast majority (84%) of the 
thahdomide references were case histories or original 
studies. If they were not indexed by the conventional 
indexes, the usual reason was “selective coverage” (84% 
for CA, 56% for IM). 

In this sample of 632 items, chemical papers were well 
covered by CA (97% as compared to 18% for IM and 








Fic. 4. Overlap among indexes and unique coverage of CA, 
IM, and SCI for thalidomide search, number of references 


Sci-l iSci-2.| Scl-3 


Fig. 5. ‘Relative completeness of CA-IM, SCI-1, SCI-2, and SCI-3 searches. Composite total of 632 references is taken as 
100%. Search time for single searches at left was 14.0 hours; for the three combined searches at right, it was 292 hours. 


— 
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Taste 4. Analysis of search product according to various attributes 


Reason not Reason not 
‘found in 04* found in IM* 
Total CA IM OA-M SOI- SOL2 SOI-8 0 J L 8 0 7 L 8 
Total references 632 110 370 429 302 348 203 15 65 8 434 47 31 39 145 
Type 
Case history—original study 525 91 320 363 260 304 173 9 49 7 369 41 14 32 118 
Review 26 10 7 14 11 15 7 6 1 9 4 5 7 3 
Patent 9 9 0 9 . 9 
Editorial 24 0 18 18 7 7 8 3 21 1 1 4 
Miscellaneous 48 25 25 24 22 15 13 35 1 2 20 
Field 
Chemical 31 30 4 3i 5 3 4 1 5 16 1 5 
Clinical 425 8 273 27 194 240 143 1 65 1 360 30 8 16 98 
Pharmacological 176 72 93 121 103 105 56 14 6 8 12 7 22 42 
Year 
1904 116 24 42 59 68 75 52 10 3 8 71 13 6 39 16 
1963 200 44 159 176 53 66 5 2 34 120 . 6 8 27 
1962 207 17 127 136 98 126 66 1 290 169 11 7 62 
1961 62 9 14 21 52 50 50 2 51 9 4 3b 
1960 22 10 14 20 10 10 10 1: 2 9 2 4 2 
1959 10 2 4 5 9 9 8 1 3 4 5 1 
1958 8 1 4 5 7 7 7 7 1 1 2 
1957 3 .2 2 3 1 1 1 1 1 
1956 4 1 4 4 4 4 4 1 1 
Language 
English 320 41 154 168 211 -225 149 12 17 4 246 25 10 26 105 
German 146 27 90 106 72 92 48 14 2 103 19 6 10 ` 21 
French 38 11 28 | 33 12 13 6 2 8 1 16 1 2 5 
Italian 30 9 30 84 2 3 1 5 21 1 5 
Other 92 22 68 88 5 15 21 1 48 14 1 9 
United Kingdom ` 108 9 62 65 118 128 92 2 5 1 146 á 1 7 89 
Germany | 121 22 78 90 63 76 39 L. 13 (d 84 17 6 4 16 
United States 108 18 6 64 7T 75 46 7 12 38 68 18 7 16 9 
Italy 37 10 30 3b 2 3 C 1 5 21 1 6 
France 30 9 23 26 8 10 3 1 8 12 1 6 
Switzerland 28 6 13 17 15 20 13 3 2 17 3 10 2 
Japan 25 10 1b 24 1 2 1 10 1 4 9 1 
Other 120 26 89 108 18 3 10 12 82 b 8 1 .17 
Times cited 
4-24 66 14 44 46 64 65 52 2 3 47 4 18 
3 37 5 2 R 34 36. 20 1 31 6 1 8 
2 70 9 36 37 59 64 30 d 4 1 55 4 1 29 
1 159 14 66 375 92 125 58 3 1i 2 129 17 9 4 63 
0 300 68 202 248 53 58 43 8 47 6 172 18 20 35 27 
Most productive reviews 
From: 8 2 1 2 7 8 7 2 1 3 1 2 4 
Lancet 69 2b 25 54 61 38 69 44 
Brit.. Med. J. 62 18 18 48 51 42 62 1 1 42 
Araneimstiel-Forsch. 18 11 10 16 12 13 10 1 6 4 8 1 
Can. Med. Ássoc. J. 14 13 13 4 6 2 14 1 
Deut. Med. Wochsch. 13 1 12 12 7 9 5 12 1 
Med. Klin. (Munich) 11 2 7 7 7 8 5 9 4 
Med. Welt 11 8 8 8 ` 8 4 11 2 1 
Muench, Med. Wochsch. 10 6 6 5 6 4 10 2 2 
Am. J. Obstet, Gynecol. 9 3 3 6 6 2 9 1 5 
Nature 8 1 5 5 5 5 3 1 8 3 
Science . 8 3 4 4 8 8 3 2 3 1 2 .1 


* O, other heading; J, Journal not covered; L, indexed later; S, ''seleotive coverage." 
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109% ¡for SCI-2). Pharmacological articles received 41% 
coverage from CA, 53% from IM, and 60% from SCI-2. 
Coverage for clinical papers was 2% for CA, 64% for IM 
and 57% for SCI-2. 

.does not routinely include case histories and 
studiés that appear in the form of letters to the editor. 
Sincelthe most timely information on drug side effects 
and toxicity is very often found in short communications 
rather than in formal papers, an index that covers this 
material should not be ignored when conducting a drug 
search. 

Most of the anonymous material (editorials) was 
located through JM (7595). One of the most important 
publications in the drug field, the Annual Review of 
Pharmacology, was not covered by IM until 1965. An 
index that does cover this publication must necessarily 
be considered valuable to a drug search. 

SCT ¡did not locate any of the nine patents that were 
located through CA. SCI did not locate papers in the 
less common languages as well as did the conventional 
indexes, While CA-IM produced 53% of the English 
articles} and SCI-2 70%, and while CA-IM located 73% 
of the German articles compared to 639% for SCI-2, for 
all other languages combined the corresponding figures 
are — for CA-IM, and 19% for SCI-2. 

e early articles referring to what were later 
n da toxic effects, the relationship to the drug was 
not known at the time of reporting, nor was exposure to 
the drug mentioned, so the article could not possibly 
have been indexed under the drug name by conventional 
indexes. | However, later reviewers who followed up these 
cases blished that the drug was, or could have been, 
— and incorporated this information into their 





reviews. | For example: Cases of acute myxedema of 
unknown cause were reported by both Kendall and 
. It was later established that these cases 
were probably due to thalidomide, which is mentioned in 








review articles by Gerarde and Mellin. The Gerarde 
review was located only through SCI. 
The 66|most heavily cited papers found on thalidomide 


e from SCI 32. 


(See Table 4): IM located 36% of the 1964 articles 
compared [to 65% for SCI-2. IM indexed 39/116 of the 


1964 articles after 1964, almost as many as it Mann exe | 


durmg 1964 (42/116). 


APPENDIC 


sources through which they were located. Appendix D 
shows a sample page of the Citation. Index. Appendix 


E shows a sample page of the Source Index. Appendix F 
is a tabulation similar to Table 4 characterizing the 
papers found only by SCI. Appendix G is a similar tabu- 
lation characterizing the papers that were not cited. 
These appendices are available from the ADI Auxiliary 
Publication Service. 


* Discussion 


SCI was expected to appear to rather poor advantage, 
compared with subject indexes, on a compound name 
search because there were less than the usual number of 
terminology problems to reduce effectiveness and effi- 
ciency when searching the conventional indexes. For that 
reason any comparative merits of SCT for locating refer- 
ences as specified in the results and conclusions from this 
experiment are considered conservative. 

It must be recognized that this experimental search 
was only one of many kinds of searches which might 
have been done for illustration. There is no basis for 
comparing these results with those of workers cited in 
the introduction because there is no correspondence in 
methodology. The results prevented here cannot be 
generalized in any way, but they represent an attempt 
toward an explicit, unambiguous case study that has 
brought to light a number of facts about each of the 
tools studied, and how they compare with one another. 

It remains to be seen if the same pattern appears with 
other compounds, other types of search questions, and 
other subject areas. 


* Conclusions 


For this search SCI and conventional indexes could 
be used together profitably; each produced a large 
number of references not to be found in the other. For 
this drug search SCI was not appreciably less efficient 
as a retrieval tool than were CA and IM ; for short time 
intervals, it was more efficient. 

More general conclusions must await further investi- 
gation. 
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Lid for Attacking Information Problems 


The “systems” approach to information system prob- 
lams is suggested, wherein problems arising from 
information origination, processing, and utilization— 
and alternative solutions to the problems—can be 
viewed as an entirety rather than piecemeal. Informa- 
Hon utilization problems involve sociopolitical con- 
siderations (e.g., "wants" vs. "needs" of users}, eco- 
nomic values of information, and the more objective 
Fonsiderdtions of timeliness, quality, and format re- 
quirements placed upon information services or prod- 


ucts. Quality is encompassed by the factors of 


| 


° a 


This paper attempts to develop a rationale against 
which information problems can be viewed and within 
which the problems can be defined and the solutions to 
the problems explicitly delimited as to generality. 

All| problem-solving endeavors can be attempted at a 
number of different levels of generality, and the problem 
solutions developed by such endeavors are (usually) gen- 
erly appli only to the extent that the problem- 
solving effort was itself generalized. That is, a problem 
can be narrowly defined by excluding the apparently less- 
central variables that might affect the validity of the 
problem solution. Then, unless the excluded variables 
are in fact constants, the problem solution may well be 
invalid in any general sense. The situation 18 even more 
serious if the excluded variables are unknown. Under 
these Circumstances, even the limits of problem solution 
validity cannot be recognized. 


In engineering work, the approach to problem solution . 


via excluding and perhaps not even defining the less- 
central variables has been called by some “methods en- 
gineering.” This approach solves specific problems with- 
out Tekand to their interactions. Thus each problem 


f This| paper was prepared while the autbor was an employee of 
Iuformation Dynamics Corporation, Reading, Massachusetts. 


specificity, completeness, and relevance. Information 
processing is shown to consist of seven distinct "unit 
processes," which may be combined in only nine dif- 
ferent ways, thus deflning nine possible types of 
information systems. The ''unit processes" employed 
interact strongly with each other and with user require- 
ments. Information origination—specifically the in- 
creasing ratio of “dross”? to '"ore'—is stated to be 
the single major information problem for which rational 
means of attack are nof apparent at present. 


EUGENE WALLT 


Lez-Inc. 
Rockville, Maryland 


solution may be optimal for a part of the "problem 
complex, but the combination of solutions is usually 
much less-than-optimal for the “complex” as a whole. 
That is, suboptimization has been achieved. 

A different approach has come to be called “systems 
engineering.” This approach (1) requires the identifica- 
ton of as many variables as possible (irrespective of 
their apparent degree of centrality to the problem at 
hand), the evaluation of their centrality, and the explicit 
choosing of the more central variables for consideration 
in problem-solving. Usually more variables are chosen 
in the systems approach than in the methods approach; 
i.e., both a broader and more detailed view is taken of the 
problem. Problem solutions developed via the systems 
approach are thus more generally applicable than those 
developed via the methods approach; further, the limits 
of applicability of the solutions are clearly defined. 

It is postulated that to date the methods approach has 
largely been followed in solving information problems. 
For example, we have carried out indexing effectiveness 
studies without a knowledge of what effectiveness means 
in terms of users’ needs. Then we have studied users’ 
needs without regard to the separation of needs from 
“wants”—“wants” based upon habit, ignorance of what 
could be had, etc. Many similar situations could be cited. 
It would seem that the time has come to apply the sys- 
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tems approach to the solution of our science information 
problems. 

The first step of the systems approach, as noted, is 
to identify as many of the pertinent variables as possible, 
thereby permitting consideration of their degree of cen- 
trality to any problem at hand. In information work, 


there are many problems and many variables. This’ 


paper attempts to set forth the variables involved, as 
a first step toward defining a network of variables, differ- 
ent parts of which are applicable in varying degrees to 
different information problems, In fact, the problems 
facing us are definable in terms of interacting variables; 
ie. the interaction of variables creates problems. 

As a second step toward defining this network of 
variables, we attempt herein to organize the variables in 
a rational manner, thus to make easier the detection of 
interactions among them and, as a consequence, the defi- 
nition of specific problems on & rational basis. Except 
for a few examples, the detection of interaction of vari- 
ables and the definition of specific problems is not 
attempted herein. Suffice it to say here that interactions 
- may take place among any of the variables to be de- 
scribed later in this paper. The actual existence of such 
interactions, and defining their importance, is a task 
for further effort. 

The principal problem areas, with their respective 
variables, are connected intimately with the communica- 
tion process. All processes have an input, a processing 
operation, and an output. lt is with these three phases 
of the communieation process that the three problem 
areas are associated: input, processing and output. We 
might call them information “origination,” information 
"processing," and information "utilization'—the three 
areas of issue in this field. Because the end to be 
achieved is that of purposeful information utilization, 
we Shall examine the issues in reverse order: utilization, 
processing, and (finally) origination. For each of these 
issues, we shall attempt to set forth a framework for 
problem definition purposes. 


* Information Utilization 
GENERAL 


If the objective of information processing activities 
is the utilization of information to improve the cultural 
or material lot of mankind (or of a segment thereof), 
then it is essential that the true needs of information 
users be ascertained. By “true needs,” we mean those 
needs which would exist tf economic and sociopolitical 
factors were not operational. It must be recognized that 
such factors, however, are operational. Just as mathe- 
maticians recognize that it is impossible to trisect an 


angle by formal techniques, so also must we realize that: 


some apparent needs of users are unrealistic because -of 
their dependence upon economic and sociopolitical, influ- 
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ences; and this comment applies to a spectrum of users 
ranging from those who really want their problem 
solved (not just the information useful in solving it). to 
those who think they need no information at all. Never- 
theless, we- must determine as best we can what the 
users’ needs would be if economic and sociopolitical com- 
plications were absent; for only m this way can we have 
a benchmark against which to measure our progress to- 
ward satisfying users’ needs—a goal for which we can 
strive. Accordingly, a discussion of sociopolitical and 
economic considerations will precede the discussion of -ob- 
Jective (“technical”) considerations which define the true 
needs of users. 


SOCIOPOLITICAL CONSIDERATIONS 


There are undoubtedly cultural, sociological, political, 
psychological, and other similar, often nonobjective, ob- 
stacles to the satisfaction of the true needs of users. In 
order to detect some of these obstacles (as well as more 
objective problems in the economic and technical areas), 
scientific disciplines should be encouraged to undertake 
critical self-examinations. A large self-review (2) of the 
information exchange “culture” in the field of psychology, 
now four years in progress, is providing some startling 
insights into the patterns of communication in psychol- 
ogy. If these findings are matched in other fields, thus 
providing a basis for general conclusions, the socio- 
political complications of scientific communication may 
be sufficiently defined to permit their rational considera- 
tion as part of the communication problem. 

It may well be that such nonobjective considerations 
may make impracticable the development, in the near 
future, even of economically justifiable systems capable 
of meeting the true needs of users. Instead, it may be 
necessary to press gradually, m an evolutionary man- 
ner, toward the more ideal problem solutions, changing 
the sociopolitical climate little by little over .a long 
period of time. 


Economic CONSIDERATIONS 


Today we know essentially nothing about how to 
measure the value of information to users. Under such 
circumstances, it is impossible to determine what costs 
can be justified in supplying information.to users, The 
problem is intensified by the probabilistic nature of the 
value variable. ; 

Several approaches have been probed, tentatively, in 
attempts to measure the value to the user of informa- 
tion. For example, a number of organizations have shown 
that their organized information activities, as presently 
constituted, cost no more than if the user searched for 
information independently, and to the extent desired by 
the user. Such an evaluation, however, gives no consid- 
eration to the cost/savings ratio under more rational defi- 
nitions of user needs or with more optimally designed 


information services for the user. Another approach is 


— — — " 
— — — — — 
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that of the reductio ad absurdum—whereby it can be 
hown that if a scientific investigation costs $20,000, 1f 
the report thereon costs $2,000, and if the average re- 
port is referred to 10 times over its useful lifetime, then 
each reference costs about $200 or 1% of the cost of the 
investigation; surely the average user benefits more than 
1% from his use of the report. Such an evaluation is 
essentially circular, however, and provides only a sub- 
jective rationalization of the value of information ser- 
vices. 

What is really needed is a way of determining what 
the situation would be, from an economic point of view, 
if information is not made available; e.g., how profitable 
a manufacturing plant would be if an information 
service of specified caliber is available during its design, 
compared to the reverse situation. This is a very difficult 
problem, but similar problems, when well defined, have 
been capable of reasonable solution via operations re- 
search techniques, even when doin conditions pre- 
vail, as in this instance. 


TECHNICAL CONSIDERATIONS 
General 


With respeet to true users' needs, three types of tech- 
nical considerations are pertinent: the performance of 
information services as concerns their timeless, quality, 
and form or format of information or data supplied to 
the user, 


Timeliness 


Certain users need information more quickly than 
others; therefore, a measure of timeliness requirements 
(like that of permissible cost) will be of a probabilistic 
nature. This factor applies either to a current awareness 
service (ie. how current is the information supplied?) 
or to a retrospective service (ie. how soon does a search 
result in an "answer"?). 

The principal problem here is that of being able to 
characterize users (and information-need situations) so 
that timeliness requirements can be objectively measured 
or predicted. 


Quality of Information Suppplied to Users 


General. It is probable that the quality of information 
required by (and supplied to) users should be capable 
of definition via three parameters: specifictty, complete- 
ness and relevance. These parameters are discussed in 
more detail in the following paragraphs. 

Specificity. Different users require information Vu 
different degrees of specificity. Some users may usually 
require quantitative data (e.g., the boiling point of water 
under a pressure of 300 psig). Other users may usually 
require conceptual—even subjective or speculative— 


| information (e.g., a discussion of the likelihood of intelli- 


gent life existing elsewhere in our galaxy). Between the 
limits set by purely quantitative and purely qualitative 
information, there exists a continuum. Against this con- 
tinuum each user will exhibit a distribution of interest 
points, but each user will also probably find that most 
of the time his requirements for specificity fall within & 
relatively narrow range. 

Completeness. Different users require different de- 
grees of completeness! of the “information supplied to 
them (ie, different degrees of exhaustiveness of re- 
trieval). Hence this factor also exhibits a probabilistic 
nature. It is not always best to supply the user with all 
documents that answer his need. For example, the user 
who needs to know the boiling point of water at 300 psig 
would prefer to have a dimensioned number—a tempera- 
ture—supplied to him. Lacking that, he would like to 
have one authoritative document in which the desired 
datum is recorded. He would be quite unhappy to re- 
ceive a hundred, or even a dozen, documents even 1f all 
contain the desired datum. On the other hand, the 
scientist who needs less specific information is also likely 
to require more completeness of retrieval. Thus speci- 
ficity requirements and completeness requirements inter- 
act. 

Relevance. Different users require different degrees of 
relevance of the information supplied to them. Thus a 
probabilistic distribution of requirements exists here, 
also. In general, the more specific the information de- 
sired, the greater the need for relevance—the need not 
to be burdened with nonpertinent information and 
documents, or both. The reverse situation also holds 
true; the more general the information desired, the less 
the need for relevance. Similarly, the greater the need 
for completeness, the leas the need for relevance, in that 
peripherally pertinent information tends to be acceptable 
and even useful in such instances. 


Form or Format of Information or Data 


This factor is principally concerned with ease of use 
and is probably the easiest-to-measure user need. For 
example, should data be displayed as tables, graphs, 
alignment charts, or equations? Should the “raw,” non- 
reduced data be included? How should a display make 
plain the constant (yet specific) conditions under which 
the data were collected—conditions which, if changed, 
might change the data values observed? Should the data 
be described (e.g., possibly for announcement purposes) 
by an abstract? Should the data and/or abstracts be 
indexed, and if so, how deeply? How should different 
set of data be grouped or categorized so that proximity 
of related data sets (from the users’ point of view) is 


maximized ? 


A similar set of considerations apply with respect to 
textual information. What level or levels of surrogation 


should be made available to the user—full text, sum- 


_ 1 Sometimes called “recall.” 
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mary, informative abstract, indicative abstract, notation- 
of-content, citation, title, or document number? What 
“depth” of indexing is required, if any? How should 
documents or their various surrogates be grouped or 
categorized ? 

With respect to indexes alone, there are important 
considerations (other than that of indexing “depth”) 
related to format, type of index entry, and character of 
index terminology. What format is optimal? Should a 
particular index be a classification or an alphabetical 
index (modified or not by subheads)? Should index 
entries be merely document numbers; or should they 
include citations, notations-of-content, or even abstracts? 
Should the index terms be complex (eg., classification 
notations) or simple (e.g. uniterms) or something in 
between (e.g. subject headings of varying degrees of 
complexity) ? 

With respect to format generally, should the data, 
information, surrogates, indexes, etc., be printed; and if 
so, on pages or on cards, and in what arrangement on 
page or card? Should these materials be full size or 
microform (continuous or discrete—-e.g., microfiche, 
positive or negative image, transparent or opaque) or 
something between full size and microform? Or should 
these materials be recorded on other types of media, such 
as magnetic tape, drums, matrices, discs, chips, etc.? If 
the recording medium is something other than full-size 
printed text, what sorts of display equipment with what 
speeds of access and operation are required ? 


* Information Processing after Origination but 
Prior to End-Use 


GENERAL 


It is m this area that nearly all investigatory work 
has been concentrated for centuries—unfortunately, too 
often with too little consideration for input (see follow- 
ing paragrapbs) or output (see preceding paragraphs) 
restraints. It is thus no small wonder that an informa- 
tion crisis exists today; it is surprising that even a worse 
crisis has not developed. This is not to denigrate the 
effort that has been expended with respect to information 
processing. Such effort is necessary; it is, however, not 
sufficient—and considerations of input and output should 
exert & much greater influence on information processing 
activities. This is particularly true with respect to the 
needs of users (and the statistical distribution of those 
needs), as already described. 

Despite this caveat, let us examine the overall field 
of information processing. A number of questions (but 
not all appropriate questions, undoubtedly) spring to 
mind immediately. What are the “unit processes” of 
this “middleman” operation? How may these “unit 
. processes” be assembled into various systems in order to 
serve the needs of various users, taking into account also 
the characteristics of input information? How may the 
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details of the “unit processes” be varied, and for what 
reasons, considering also the interactions among the 
processes both within and among systems? How may 
systems each designed to serve a certain type of need by 
processing a certain type of input, best be intercon- 
nected? To what degree, and why, should such systems 
be centralized or decentralized ? 

Within the processing area, we can detect seven dis- 
tinct "unit processes," each of which may be widely 
varied in detail and in scope. These are the processes 
that we shall tag with the general terms of acquisition, 
surrogation, announcement, index operation, document 
management, correlation, and vocabulary control. From 
one or more of these seven processes can be constructed 
every information or data system, excluding those con- 
sisting of direct communication between the originator 
and user of information. These unit processes are 
briefly discussed in the next section. 


THe “Unrr Processes” or INDIRECT COMMUNICATION 
Acquisition 


This process also encompasses evaluation and/or selec- 
tion of documents or data for input into current aware- 
ness, announcement and/or retrieval systems, into cor- 
relation or data reduction processes, or for use without 
further processing; it also includes descriptive catalog- 
ing and duplicate checking. 


Surrogation 


This process includes abstracting, indexing, data reduc- 
tion, or the like. It should be noted that information 
not present in the full-text material, or raw data, cannot 
be reduced to surrogates. This surrogation is no substi- 
tute for serendipity, for reasoning by analogy, or for 
creative thought generally. 


Announcement 


This process encompasses the assembly, production 
and distribution generally or specifically (ie., selective 
dissemination of information) of announcement media 
for documents, data tables, etc. 


Index Operation 


This process includes input of index information into 
a physical medium and the searching of that medium 
routinely or on demand (including the “negotiation” and 
formal formulation of inquiries), and the provision of 
outputs consisting of actual data or of references 
and/or other useful document surrogates (e.g., abstracts). 


Document Management 


This process encompasses the physical storage and 
retrieval (based upon “addresses” only) of documents or 


ta and the reproduction and inventory control of such 
items. 


orrelation 


This process creates generalized information from 
ultiple, other, more specific items. The creation of 
state-of-art reviews is an example of the correlation 

ocess. Analysis to determine the significance of re- 
uced data is another example. In short, correlation is 

creative activity and differs from the research process 
as it is usually defined) only in that correlation per se 

uires no material experimental activity on the part 
f the correlator. 


ocabulary Control 


This process includes all operations useful in bringing 
to conjunction the vocabularies of originators, proc- 
rs, and users of data and information, to the end 
that the communication channel will be as effective and 
et noise-free as possible. The physical impedimenta 
with which the vocabularly control process is most often 
concerned are the various types of authority files (in- 
cluding but not limited to subject authority files), 
together with their syndetic structures and conventions 
for updating and use. 


SYSTEMATIZATION OF THE “Unrr Processes” 


If vocabulary control is considered as a general func- 
tion of all systems, to be applied (or not) to the required 
degree, then a limited number of basic information sys- 
tems are possible. Figure 1 indicates that only eight 
basic systems are possible, excluding that system (num- 
bered “zero”) consisting of direct communication be- 
tween the originator and user of information. That is, 
only eight unique paths through the “unit processes” 
from origination to end-use are possible. 

External to a given system, of course, things can get 
much more complicated. Interfaces between informa- 
' tion systems can occur, bringing about (in effect) loops 
in the pattern. Two principal types of functional inter- 
faces can exist. The first, indicated by a dotted line, 
may be termed reorigination. The second, indicated 
' by dashed lines, may be termed reacquisition. In addi- 
tion, interfaces with respeet to information or data 
coverage and clientele servicing may exist. 


DESIGNING THE ÍNFORMATION SYSTEM 


Ideally, when users' needs have been defined, the re- 
quired combination of “unit processes" should be as- 
sembled to create a system, and the necessary input 
information or data should be located and processed for 


use. Because the details of the “unit processes” interact - 


with each other, each such process must be designed as a 


part of the system—not. independently. For. example, I 


the surrogation proeess of indexing for announcement 
purposes only should probably differ markedly from the 
process of indexing for retrospective retrieval purposes. 
Again, the work done in developing variants of the 
“unit processes” and in meshing them into systems has 
been valuable and necessary. It must be reiterated, 
though, that such work is not sufficient to solve all the 
problems that face us. Work on improving the processing 
of information, between its origination and use, must 
take place with full cognizance of the restraints applied 
by origination and use. 

In conclusion with respect to information processing, 
we note that not only the environmental variables (e.g., 
users’ needs) but also the available operational tech- 
niques all interact strongly. 


* Information Origination 


Of the three areas of concern, that of information origi- 
nation has until very recently consumed more of our 
effort, and has received less of our rational attention, 
than has either of the other two areas. For example, we 
have spent much time and money in publishing informa- 
tion, but relatively little in deciding what should be 
published. It is apparent that much (perhaps most) of 
the information being originated should never go “on the 
record," that a great outflowing of trivia or duplicative 
information is being published or disseminated in one 
manner or another. Such a proportion (perhaps pre- 
ponderance) of dross in our raw material makes re- 
covery of valuable materials much more difficult and 
costly. The problem facing us is: “How can we effect 
birth control of information such that we exclude from 
the communication pattern as much of the trivia as 
possible without excluding useful information?” Several 
approaches are apparent, but none of them seem to have 
the full potential which we desire. 

We can, of course, continue to work to make available 
(to each potential originator of information) the perti- 
nent scientific or technical information developed in the 
past, in the hope that the potential originator would be 
80 knowledgeable that he would not undertake the 
pseudoereative effort to originate trivia or duplicative 
information. This may be a forlorn hope, in view of the 
all-too-frequent practice of re-reporting one's own work. 
All other things being equal, human beings may still 
attempt to build their own professional stature by what- 
ever means they can. 

Another approach would be to insure that all work 
is well evaluated (before its publication or dissemination) 
by knowledgeable yet disinterested referees. Here we 
face the problem of finding ways of inserting referees 
into each of the many (often devious) routes that a 
persistent originator might employ to inject his trivia 
into the communication pattern. We also face the prob- 


.lem of having truly knowledgeable referees, although 
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this obstacle may someday be overcome by giving 
to each referee not only the candidate informational 
item but also all pertinent information on the same sub- 
Ject which has been developed in the past. Finally, under 
such circumstances, will not the referees need compen- 
sation, and if so, who will pay? 

The most effective solution to the information birth 
control problem-——and the most difficult solution to 
implement—would be to attack the problem at the 
source, at the originators themselves. We should like to 
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find a way of making the originators actively want t 
avoid reporting trivia. At this time, however, I see n: 
positive incentive for originators to be so altruistic 
There seems to exist only the negative incentive o 
ridicule, and ridicule is relatively ineffective becaus 
there will always be uninformed readers who will gran 
to the originator of trivia at least some reputation o 
acclaim. 

It seems likely that the information origination prob 
lem will be with us for some. while. 


e Summary 


Many attempts made to date to solve information 
problems have been either ineffective.or else not demon- 
strably optimally successful because insufficient. atten- 
tón has been given, during the problem-definition stage, 
a sufficient number of the important variables in- 
volved. In this usual "methods" approach, the interac- 
tions between only two (or among a few) variables are 
estigated, without control or perhaps even without 
ervation of other important variables, and problem 
lutions are developed only in terms of the controlled 
rables.. The variables are usually probabilistic in 
ture (Le. statistical distributions exist) and all may 
interact strongly. It is therefore suggested that the “sys- 
ms" approach should be employed, wherein all im- 
ortant interacting variables are considered and mea- 
ed and their measurements correlated during problem 
efinition and solution. 

The important variables are described in three cate- 
ories: information utilization, processing, and origina- 
, In the utilization category, sociopolitical (and, to 
ome extent, economic) considerations constitute vari- 
bles that are still difficult to define and to measure. Yet 
e distinction must be made, before problem definition 
be effective, between true user needs and apparent 
er “wants,” the latter the result of habit, complacence, 
ignorance, fear, or lack of motivation. Similarly, the 
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ond sociopolitical and economie variables, there are 
echnical variables of user needs with respect to time 
or timeliness) and quality of service provided. The 


tid 


alue of information to users must be quantified. Be- | 


factors” of relevance, 


— 


latter requirement variable ean be subdivided into the 
completeness, specificity, form, 
and format. f 

The processing variables are related to functions that 
may be performed by an information facility—te., 
acquisition (and associated activities), surrogation (e.g., 
abstracting and indexing), announcement, index opera- 


tion (including input and searching), document manage- 


ment (including storage, dissemination, and related proc- 
esses), correlation (the creation of generalized informa- 
tion from multiple, more specific inputs), and vocabulary 
control. It is shown that the possible combinations of 
these processes permit only eight basic types of informa- 
tion facilities to exist. 

Information origination appears to be the least well- 
defined problem area, and further definition of its 
variables is required. The problem may be broadly 
stated as that of excluding trivia and other information 
of low utility from the communication channels while 
still accepting valuable information. This statement, 
however, is much too general to serve operational pur- 
poses in information research endeavors. 
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Distribution of Indexing Terms for Maximum Efficiency of 


Information Transmission 


A function was developed for the optimum distribu- 
tion of indexing terms by the number of postings. This 
makes it possible to transmit information with maximum 
efficiency. The comparison of the actual distribution 


Organization of indexing data in a manner that permits 
retrieval of the greatest amount of information is a 
premise fundamental to an efficient performance of all 
information storage and retrieval systems. In this con- 
text an index may be considered a channel linking the 
information store and the user or searcher. The problem 
may be viewed as that of organizing information store— 
that is, the indexing "terms" (subject deseriptors) and 
their "postings" (document identifieation numbers, for 
instance)—so as to make a maximum use of channel 
capacity, permitting the transmission of information 
from the store to the user with maximum efficiency. 

For a given system, indexing terms can be grouped by 
the number of postings they carry, to form a frequency 
spectrum characteristic of that system at a particular 
time. This paper reports the development of a distribu- 
tion function of term groups by number of postings 
which allows information transmission with maximum 
eficiency, and it proposes a measure of evaluation of a 
system's efficiency in this respect. 

The solutions are developed subject to three assump- 
tions. First, no distinction is made between useful and 
useless information; only the amount of information, 
not its subjective value, is considered. Second, the pos- 
sible effects of indexing language and of the function and 
form of index terms are not taken into consideration, as 
they are likely to vary from one vocabulary to another. 
Third, the channel is assumed to be a noiseless one; that 
is, if term T, is addressed, the probability of retrieving 
it with all its postings is equal to 1. 
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of the term groups with the calculated optimum dis- 
tribution provides an objective measure for evaluating 
any indexing system with respect to its efficiency as 
information transmission channel. 


PRANAS ZUNDE and VLADIMIR SLAMECKA 


School of Information Science 
Georgia Institute of Technology 
Atlanta, Georgta 


The efficiency of a statistical information transmission 
model is defined as 

UT Y) 

m= (1) 

where I(T Y) is the transinformation and C is the chan- 

nel capacity. The channel capacity is in turn defined as 


In this equation, H(T) is the source entropy, which in 
our case is the entropy derived from the relative fre- 
quencies of indexing terms by the number of postings, 
and H(T/Y) is the conditional entropy or average uncer- 
tainty given that a particular symbol has been received 
as to the symbol which was transmitted. 

Since the channel was assumed noiseless, the conditional 
entropy 

H(T/Y)=Ü0 
and the transinformation 
TOT Y) =B 1) 

Thus 


In information theory, the entropy or measure of un- 
certainty of a complete finite scheme is given by 


H(T) =— > p(é)log p(t) (4) 


where p(t) is the probability of the occurrence of the | 
event T (in our case, the probability of term group T, 
having £ postings). | 

From Eq. (1) we see that the efficiency of an index 
system is highest when transinformation is equal to the 


channel capacity. Since in our case the channel capacity 


isi equal to maximum source entropy, the problem is to - 


find such a frequency distribution of term groups by 
number of postings which produces maximum source 
entropy. The solution is subject to two constraints: 


Y p(t) =1 * (5) 


Y ow = t= const 0 7 (9) 


e first constraint states that the sum of the relative 
frequeneies is equal to one (viz, we have a complete 

ite scheme). The second constraint states that there 
is a fixed amount of postings in a given system, expressed 
as the average number of postings per term. 

Using the method of calculus of variation and Lagrange 
undetermined multipliers, we can write: 


4H (T) = — 2[1n p(t) + 118p(t) —0 (7) 
aZop(t) —0 (8) 
pztsp(t) = 0 > (9) 
ding Equations (7), (8), and (9), wë get 
Z[inp(£--Fa--Bilep(.)—0. - (10) 
hence | 
In p(t) +a + Bt = 0 . (11) 
0 
p(t) = ee f! (12) ' 


To find the Lagrange constants « and B, we turn to the 
constraints equations. Substituting Eq. (12) into E 
(5), we obtain 


A 


eg! —] | (13) 


nce 





> gc ; ao 


fal 


bstituting the expression for e% into Equation (12), 
ve obtain 





p(t) — = | 
gt (15) 

izl 
ubstituting the expression in Eq. (15) for p(t) into 
e second constraint Equation (6), we find 


Pi 





tal 


he summation limits are the lowest and the highest 
umber of postings under any one term imour vocabulary 


M a: — 


= | | (16) 


(ie, 1 and n). This range of postings is, obviously, 
bounded; we can, however, extend the upper summation 
limit to infinity by simply considering the remainder in 


‘our series as the sum of terms corresponding to the fre- 


quency of terms with “k+1 and more" postings, pro- 
vided k is large enough. Both series in the numerator 
and denominator are convergent series for 870. Under 
this assumption, the sum of the series in the denominator 
ig easily established. Thus 


e? 





Ži _ 

g = 
— gd 

t= 1 € 


(17). 


The limit of the series in the numerator can be found as 
follows: 


` la 3 ° e? ee 
te*t= =< mp Eg v]. AA Zm = ELEME 
2. : al 2 | al E | (1— e?» 





(18) ` 
Hence l 
er 
. er 1 
t= ot ]—e^* (19) 
1— e” 
and 
Bom (i-) (20) 


By substituting the values of B into the — we. 
obtain the frequency distribution function that maxi- 
mizes the source entropy: 


(a) 
3-4) 


The limit of the convergent series in bus denominator i8 
easily obtained as 


e 1 1 - 
> (5) == t—1 (22) 
fci 


p(t) = (21) 


and finally we write 


p(t) = ay . (23) 


——— For an information storage and retrieval 


-system with 1,000 documents, each of which is indexed 


by 22 terms on the average, with & vocabulary of 2,000 
terms and the average number of posting per term 


t=11, the optimum frequency distribution of term 


groups to produce the maximum average amount of 
information per term would have approximately 182 
terms with one posting, 165 terms with two postings, 
146 terms with three postings, and so on. The percentage 


1The effect of the approximation error should be evaluated in each 
particular instance. For small b and f (B«1), it might be significant. 
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of terms which should have 100 or more postings can be 
ealeulated as follows: 


> ro) = Y pO — > p(t) 


1-100 
t t 
POE 
N t t 
tu puel t-i po] 


| 
< 
| 
pi 
— 
— 
oe) | 
— 
8 
| 
ce 
= 
O 
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Thus, in this example of a system, 0.072% of terms 
(between one and two terms) should have 100 or more 
postings. The percentage of terms with high numbers of 
postings clearly depends on the average number of 
postings per term: the greater this average, the higher 
the percentage of heavily posted terms. 

An immediate use of the concept of term group dis- 
tribution is in the evaluation of existing information sys- 
tems. Figure 1 compares the actual and optimal term 
group distribution curves of two information systems, 
that of the Defense Documentation Center (in 1960) 
and & private experimental one (1). Although the actual 
distribution curves of both systems (solid lines in the 
graph) differ from their optima (broken lines)—imply- 
ing & less-than-efficient use of channel capacity—the 
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Fig. 1. Actual vs. optimum term-group cumulative distribution in two information systems; 1— Defense Documentation 
Center System (1960) ; 2 — private experimental system (1960) 
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M 


e deviation js considerably more serious in the experi- 
ntal index. 

To calculate the efficiency coefficient of a given — 

e calculate the transinformation J(T,Y) from the given 

equency distribution of term groups. Next, the chan- 

el capacity for the system can be derived as follows. 

It has been shown (2) that, for a. discrete random 
variable with 


- of the efficiency coefficient will show that 


3p(t)=1 andthomean Ztølt) =i 


H(1) =— Y p(t) In p(t) Sin M(8) + fi (24) 


with equality if and only if p(t,) is the ms ug 
function of the entropy equation.. Then 


a x 


. MB) = X ez; (95) 


121 


where f) is given by 


p=—in(1--) (20) 
Furthermore, we have already shown that 


n> 0 





e 


(27) 


ta 


Substituting the expression for 8 ito the Equation (27), 
we get 


ES 

us a (28) 
NE 1—exp| 11) | ! 

i 


Thus, we can rewrite Equation (24) as follows: 
P E 1 Wnt A 1 
LH (T)]uix = In M (8) + Bt —1n (£ — 1) —t1n (1 =) 


—tint—(i—1)1n(—1) ^ (29) 
which is the expression for channel capacity. 

According to the last equation, the channel capacities 
of the experimental system with t=3.9 and of the DDC 
system with t=148 would be 2.22 nits and 6 nite, re- 
spectively. By interpolating the graph reproduced in 
Houston and Wall’s article (1) (since complete data 
were not available to us), we obtained approximately 1.66 
nits actual source entropy (=transinformation in the 


given core) for the experimental system and 5.4 nits for | 


the DDC system. Hence, the efficiency coefficients are 
approximately 0.75 (7595) and 0.9 (90%),-respectively. 
In their article Houston and Wall, having investigated 
a number of indexes for frequency distribution of post-- 
ings under the terms, came to the conclusion that’ the 
distribution is lognormal for all indexes. Furthermore, - 
they derived an empirical probability distribution func- 
tion, which is supposed to give the frequency spectrum of - 


` 


all actual indexes, if the sample they investigated was 
representative enough. The proposed function is: 


cci Rs 
pta) = exp | LEG x e (30) 
where 
. g number of terms with z postings (11,23, ...) _ 


A= 2.0— 0.14 log,, P 

B= 0.67 log,, P—24 

P — total number of postings 
It can be shown that the average source entropy for this 
type of distribution can be approximated by 


A(X) =— 2 p(z) Inpa) | 


we | — 917 In 0.17 Vr 0.17 Vx In 10(0.67 log P —24) 
V2 A log e V2 log e (2.0 — 0.14 log P) 


o 017 Ve orp (097 log P — 24) 
` 2log e (2.0 — 0.14 log P) - Ya 


I . 
_ + erf | — (2.0 log z, — 0.14 log P log z, 
E g g g: 


0.17 In 10 


o dE 24) | | — loge (2.0 — 0.14 log PY 


exp(—4(4.0 — 0.56 log P + 0.0196 log? P) log? za 
— 4f — 2(2.0 — 0:14 log P) (0.67 log P — 24) logs, 
+ (0.67 log P — 24)*] —4(0.67 log P — 24) 
ES 0.17 

2 log e (2.0 — 0.14 log P) 
— (0.67 log P —2.4)] 

| [2.0—0.14 log P) log z&—(0.67 log P—2.4) 1° 
i IO CINE CDM ME s 


— (0.87 log P — 24) exp | --srgr | | (31) 


| —[(2.0—0.14 log P)log z, 


It has been observed that the maximum number oí 
postings per term is approximately £,,,—0.03P. Substi- 
tuting this value in Equation (31), we get 


sre _ [ — 02793 + 0.5867 log P 
H(X) =| (2.0 — 0.14 log P)* 


x [o SEM) 


1.543 log P — 0.14 log? P — 0.646 
a S O) 
s. JE 


; 0.9014 
(2.0 — 0.14 log -py | [exp (— 0.0088 log’ P 


+ 0.2162 log* P — 1.2852 log? P + 1.0268 log P — 0.2076) 
— exp (0.2245 log’ P + 1.61 log P — 2.88) ] 
01 
A Ea 
X exp(— 0.0048 log* P +- 0.4036 log? P — 4.9188 log" P 
+ 15.7019 log P — 14.8294) — (0.67 log P —24) 
exp (—. 0.2245 log? P + 1.61 log P — 2.88) (32) 


10.14 log? P — 2.8832 log P + 5.446] 
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100,000 


Houston and Wall also própose in their paper an ` 


empirical formula relating the total number of terms 

in the index with the total number of postings: 
T = 3,300 log (P + 10,000) — 12,600 

where T is the total number of terms and P total 


number of postings. . The formula is valid for the range 
of P from 10,000 to 1 000 000. Then 


l P 
320 iog (P + 10,000) — 12: (83) 
Substituting this expression into: Equation (29), we 
obtain the channel capacity as-a function of P. On the 
other hand, from Equation (32) we can calculate the 
actual source entropy for indexes for various values of 
` P, if the frequency distribution of their postings con- 
forms with Houston and Wall's formula. The correspond- 
ing efficiency coefficient will be the ratio of these values. 


— 
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Fic, 2. Actual source entropies H oe and channel capacities C of systems with indexing. terms. distributed according to 
Eq. (30 l . " 
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In Fig. 2, channel capacities ‘and source mena of 


. indexes that obey the empirical distribution formula of . 
. Houston and Wall are plotted as functions of the total 


. number of postings P. - 


À fuller discussion of the uses of this — in kis f 


- design of information systems and in the control and. 


optimization of the indexing and storage apparatus of 


existing systems is in preparation; 
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Announcing 





PANDEX 
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$'s and Secrets 


The p se of this brief commentary is to attempt to 
RI apum interest of researchers concerned with the 
scientific and technical information problem into considering 
the impaet that $'s and secreta could have on the problem. 

“Pe” as used here relate to the proprietary interests of 
industry in its scientific and technical information, whether 
.resulting from industry funds exclusively, or whether in- 
cluding some publie funds. The $'s also relate to the pub- 
lishing industry, and to copyright. = 

"Seerets" as used here relate to the responsibility oí the 
Federal establishment to preserve the security of the United 
States and the attendant need to control that scientific 
and technical information vital thereto. 

While these aspects of the scientific and technical infor- 
mation problem (str) have, indeed, been recognized, most 
present sTIP research seems to shy away from $'s and 
secrets. It’s easy to, understand why, since both present 
extremely sensitive and delicate problems. In the case of 
$’s, all sorts of industrial rights are involved. As for secrete, 
national security itself could be affected. And notwithstand- 
ing $'s, secrets, and the sre, such progress in science and 
engineering is being made that it may be that the grip itself 
has been overemphasized. — '' dA 

If we accept the observation that progress is being made 
in science and engineering, irrespective of $'s, secrets, and 
the stp, then we can hypothesize either that the stip only 
might make scientific and engineering progress slower or 
more expensive than optimal, or that the progress that is 
being made is accomplished by those scientists and engi- 
neers who are Information-Haves, That is, technological 
progress is made both by those people who really have no 
sim, and by those who have the necessary access and 
need-to-know for $'s and secrets. Should this line of rea- 
soning be even remotely correct, then $’s and secrets 


| Sources of 
- | Information/Data 
Domestic and Foreign 


` Brief Communications 


Release Conditions 


| Initial | 


E 


could be & more important portion of the smi» than those 
portions that we now emphasize. . 
Consider the information/data transfer spectrum shown 


in Fig. 1. The purpose of this figure is to accentuate the 


impact: that release conditions ($'s and secrets). could have 
on information transfer. Obviously, if the "needer" knows 


the significant originators, and if he has the necessary - 


need-to-know, he.can go directly to the originators and 
forget the rest of the spectrum (e.g. join the two ends of 
the figure together). But, possibly many of us don’t know 


all the significant originators, nor do we have all the need- 


to-know. However, for that information not constrained by 
$'s and secrets, we soon learn where to approach the trans- 


. fer gpectrum. But what do we do about $'s and secrets? 


When the $'a relate to scientific or technological achieve- 


` ments that have industrial market value (eg. a patent) 


and if the achievements can be identified, then purchase 
alone can provide the information. If not, probably there’s 
nothing to be done but to take the chance of duplicating. 
And in the case of commercial publications, the only solu- 


tion may be to wait until the information is for sale. 


When the scientific and technical information is secret, ` 
how do you know it exists, unless of course you are already 
doing work for one of the Federal agencies? And, even if 
you are, how do you know that you have access to neces- 
sary information produced by some other Federal agency? 

Typical questions that arise as a consequence of thinking 


about $'s, secrets, and the stip are: , 


1. Do $'s and secrets have any impact on the progress 
of science and engineering? 

2. As a general rule, is the information that is con- 
strained by $'s and secrets better than the information 


that is not so constrained? 


Gustavus S. SIMPSON, JR., AND Jonn W. Munpock 
Battelle Memorial Institute 

Columbus Laboratories 

Columbus, Ohio 
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- 


410 ^ American. Documentation — April 1967 . | 5 C > l 


2 Handier Writing - 
| No matter how many machines we have around us, it 
turns out to be necessary to scribble & note to someone, 
sometime. And what curious things they are, these notes, 
with the ever-present danger of being misunderstood, not 
to mention .a kind of contempt for the medium inherent in 
; She commonly poor calligraphy. 
Figure 1 shows a sample written in haste and exhibiting 
' most of the possible faults. 


Through chance, the opportunity to reform came from A na 
a booklet (7). Subsequently a (mostly British) collection a toh pod a I J 
(2, 8, 4) formed itself, and & self-taught version of Chancery | pU ^ 


script evolved (Fig. 2), a process by no means finished. J 
The question most often asked is, "But then my hand- š . Fra. 1. Before 
writing would look like everyone else’s wouldn’t it?” For- E 
tunately for self-respect and for cashing checks, individual . 2 
scripts really are individual. l 


I urge you to try this singular adventure of reforming pp | ⁄ * " ⁄ - 





—— ` r 


your script (if it needs it). 


A. Ji F se — some 
| a thung L had. yLe Vev— 
| 1. Tam, J. C., Good Handwriting and How to Acquire It, 


3d ed., Phoenix Houise, London, 1954. 
2. Dusk. J. L., Teach Yourself Handwriting, English I Aone be Jove E L E was a 
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| 3. FamsaNx, A. A Book of Scripts, rev. ed, Penguin, house F TE part Western 
Baon wori, 1952, ` 
4. Wear, A, Written by Hand, George Allan and Unwin, 
London; 1951. ^ and m part Japanese ! 
i Kari. F, HEUMANN 
. Bethesda, Maryland Fig. 2. After 
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Letters to the Editor 


Dear Sir: 


Almost all the world’s underdeveloped nations; whether 
in Asia, Africa, or Latin erica, have one common goal: 
to catch up in every sphere of activity with the developed 


nations as rapidly as dicm Methods and techniques are 


of no consequence in their modernization alma. 

‘Recently two items, arriving on the same-day in my mail, 
seemed to gybe. One was & paper delivered by Dr. Roger 
Revelle, Director of the Scripps Institute of Oceanography, 
University of California at La Jolla, at the Special ational 
Conference called by the U. 8. National Commission for 
Unesco in New Orleans in September 1968. The paper, 
entitled “Science and Social Change,” urged the utilization 
of all the social sciences in dealing with both the problems 
of developed nations (avoidance of nuclear war, big com- 
puters and bi le government, and urban problems), and those 
of the. underdeveloped countries (the population explosion, 
nutrition and the food supply). To quote Dr. Revelle (1): 


Of perhaps even greater importance than scientific agron- 
omy and appir genetics are the economic, social, and 
political problems of agriculture in the less-developed 
world. These involve farm credit, marketing, storage, 
transportation, land tenure, crop diversification, invest- 
ment in processing agricultural products, and above all, 
communication ith, and motivalion of, the farmers. 


It is' perfectly obvious that unless information on the 

ve topics and on technical "know-how" can be made 
available in very many sections of each agricultural nation 
in which the need exists either to expand areas of land not 


presently under cultivation or to multiply crop yields, the 


ossibility of solving the nutrition and food supply prob- 


. ems seems doomed from the start. 


y device in the battle: against hunger and malnutrition? | 


The author of the second item I received, Dr. Herman 


Felstehausen (Assistant Professor of Agricultural Journal- 
ism in the Land Tenure Center of the University of Wis- 
consin, and currently stationed at the Inter-American Land 
Reform Center, Bogota, Colombia) confirms Dr. Revelle's 
statement (2): 


Availability and distribution of new materials are abso- 
Iutely necessary if administrators and policy makers.are 
to use the results of research and evaluations to improve 
development plans and programs. This paper proposes 
that the agricultural groups in Colombia establish an 
` Agricultural Information Center for the collection and 


circulation of materials ee to agricultural develop- 


-ment and the social sciences. 


Simply stated, the problem is this: in view of the under- 
developed current status of the typical library (public, 
university, or special) in the underdeveloped nations of the 
world, what can be done to utilize this vital communication 


his paper, Dr. Felstehausen begins by discussing the 


.. difficulties in the collecting of published materials of all 


physical. forms in Colombia,. following this with data on 
agricultural libraries in Colombia. He believes the conven- 
tional library to be inadequate (in terms of lack of trained 


librarians, time Jags in cataloging, circulation, storage, etc.) ` 


m not within the financial means of most Latin American 
nations. 

To meet the needs of agricultural development Dr. Felste- 
hausen — & plan for an agricultural information 
library that would utilize perforated cards, cataloging by 
computer, and various additional nonconventional tech- 
niques. Among these would be elimination of the Dewey 
Decimal (or Lc) number in favor of consecutive numbering 
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' which are available for rent in 


of items as they arrive. This number would be utilized for 
circulation, inventory, and shelf location. According to the 


_ paper, much of the work in ‘physical preparation would be 


eliminated; yet, lists of holdings (produced by computers, 

ta at reasonable rates) 
could be rapidly. produced accor Man author, by- title, 
subject, date of publication, or publisher. Furthermore, 
since materials in the field of agriculture consist primarily 
of pamphlets, folders, and mimeographed papers rather 
than books, vertical files (or ——— les on bookshelves) 
are sufficient. 

Dr. Felstehausen goes into the various aspects of his’ 
proposal in detail, portraying the obvious advantages of his. 
suggestions over traditional library routines and practices. 
His final Pd, inde is the use of a photocopying machine 
that not only would simplify the library-loan po 
reduce the amount of items lost, but in addition, for a small 
charge, would provide the client with his own copy. 

Librarianship in Latin America is beginning to display 
— progress as witness the feasibility. study undertaken 

by Fundación Inter&mericana de Bibliotecología 

of Buenos Aires, with funds provided by the Rockefeller 
Foundation and with assistance by R. R. Bowker Company 

io survey the possibility of & cooperative cataloging and 
bibliographie publication center in Latin America (8). 
The recently established publications, Bibliografia and 
Colbav of Caracas, Venezuela, and the giant steps taken 
by librarianship in, Mexico and Brazil, not to mention the 
recent research projects and responsibilities for studies on 
Latin American library education undertaken by the Inter- 
American Library School of Medellin, Colombia, are 
indicators that the time may be ripe for the implementation 
of Dr. Felstehausen’s proposal in the form of & pilot agri- 


cultural information center. 


What with the personnel (spell cataloger) shortages, 
the time lags in making items available for circulation, and 
the availability of electronic equipment at more reasonable 
rates, ib appears that a goodly portion of what Dr. Felste- 
hausen suggests comprises “the handwriting on the wall” for 
U. B. libraries. All the more reason, then, to apply his pro- 
posal to underdeveloped nations that want and need to 


- “leapfrog” into the twentieth century. For many of them 


it is not merely & matter of prestige, but & question of the 

survival of masses of their citizens. A forward-looking ap- 

proach in librarianship and in the imparting of life-giving 

Ea in agriculture could conceivably make the 
erence 
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1966| issue of Amencan Documentation (pp. 178-179), 
eless overlooks one factor that will undoubte tedly 
militate against the popularization of the microfiche, as it 
must have done among the factors enumerated by Dickison 
with respect to microfilms and microcards, I refer to the 
importance of making the text of microforms accessible to 
through the princi 
standard collections of the library, using 
the same symbols, tracings, Anda eatalogi 


the same. classifica- 
rules, and 


rary, be that in card or book form., 


media for their — acceptance grows out of us re- 
d frustrations I have experienced jn gaining ready 
‘to the pamphlet files and other special collections of 
a library when these were-not accessible through the main 
pub c catalog of the library. 






gs of the pamphlet file in the main Rn aoe catalog, 
with|gratifying results. The scheme has m the .contents 
of the pamphlet file as readily acceasible to users as the 
holdings in the standard collections of the library in the 
publie catalog. 

To the same end I have extended the principal cataloging 
scheme of the library to the organization and processing of 
audiovisual materials, in the design of the School Libraries 
Automation Project (SLAP), an ESRA Ti 


tant ünusu 






that [are now lost or not readily accessible to users. Tt does 
rough & single, coordinated system of indexing and 


retrieval, without altering the conventional organum of. 


the ollections. - 


ization and —— of all the eund and types 
( or gaining depth access to selected items 
in these collections is so great that the idea has been in- 
corpbrated by the writer in the design of ‘an automated 
libr system for the worldwide campuses of Friends 
World Institute, an experimental college 1n operation since 
September 1965, and for the Conflict-Resolution Study and 

arch Center of the Institute to be. coordinated with 
the library resources of the worldwide campuses. 







ing extremely ——— materials, js important is that the 
averhge user tends to limit himself to materials available 


in peripheral collections. 
The material in these peripheral collections may be even 


parttof the central cataloging scheme of the library, their 
lay be impaired even for those who may need them 


tion ` accessible through the main publie catalog. Impor- 
tant) material in the peripheral collections which: should 
nonétheless have priority over the material in the standard 


conventional collections may either altogether escape the 


attention of the user, or come to his attention after he has 
struggled through & maze of secondary materials that have 
robbed him of precious time that may well have been 
spent on the more essential sources. 

e use of microfilms and -microcards would have been 
m times more since their introduction in libraries, 


they been cataloged according to standard procedures and * 


the entries incorporated in the main public catalog. I know 
of & college that possesses the microcard edition of all the 
known colonial imprints, but the collection is hardly used 


because the only w a to gain access to it is through Charles. 


Evans Amencan Bibliography. Students as well as faculty 


. collection in the publi 


al — scheme of the 


ibe the resulting entries in the central public catalog 
p 


T remedy this situation, I have incorporated the subject ` 


_ properly attended to. 2 


tle TII Operational: ` 
Grant projeet to be implemented: in the course of three | 
“years. The Ta also provides for depth access to impor- 

out-of-the-way materials in all the collections 
e library whatever the size or the form of the media, . 


than the material i in the main: 


ere these collections are not cataloged as an inte al . 


; Their use will be peripheral-to the use of the collec- , 


. number of p 
be just E 


y Š may consult: it with ease pub if ier — — daté of * 
`. publication and the name of the author .of a work, The 
-subject-access to the collection ‘through Evans is clumsy and 


tedious, and the absence of any entries of the works of the 
e catalog of the college library has 
reduced it.to an appendage of little interest or importance 
among the avalanche of secondary sources’ that constitute 


-the bulk of the library. What a pathetic waste of financial 


investment and what an irreparable loss to the academic 
program of the college to hoard this magnificent, inclusive 
source of American ry, literature, social life, and cul- 
ture in their varied di 

If microform collections are cataloged in the same way as 


` standard collections, and the entries are incorporated in- the ` 


principal publie catalog of the library, their relevance and 
accessibility in the total program of the library will un- 
doubtedly be multiplied many- times over what they are 
now. This circumstance may have compensated for some of 
the other factors enumerated by Dickison that have mili- 
tated the popularization of the microfilm and micro- 
card, It would assuredly do the same for the microfiche 
even were its use arrested by the same limitations that 


` affected the use of the microfilm and microcard adversely. 
Indeed, even if all the limitations listed by Dickison were. 


removed from the microfiche, its use would still be 
hampered by the relative lack of accessibility to its contents 
if the cataloging problem raised in this commentary is not 


CHARLES A. VERTANES 
Sponsor, Library Automation Project. 
Brentwood Public Schools - 

` Brentwood, New York 


and 
Director of Research . 


and Library Consultant 
cng —— Institute 


Dear Sir: | 


In the January 1968 issue of American Documentation, 


. L.H. Mantell arrives at an estimate of the literary output of 


scientists and engineers in research and development uring 
1964 that is very much lower, as he observes, than the 
estimates that have been made by others. However, Man- 
tell’s method has a serious downward bias that vitiates his 
estimates. 

On examining the annual author indexes of a number of 


‘journals, Mantell finds the number of authors who have had 
` one publication in each journal, the number who have had 


two, three, and so on. Assuming that authors were selected 
for each journal by random sampling with replacement 
from a population of potential authors (a distinct — 
tion for each journal), he applies the theory of the Poisson 
distribution to estimate the total size of the hypothetical 
population of “authors”. from which the actual authors 
were drawn. 

So far so good. Summing over 20 journals, Mantell finds 
that 2,169 authors have appeared once, 229 twice, 44 three 
times, 12 four times, and 7 five times ahd over. The esti- 


| — from Poisson assumptions, of the number of nonpub- 


“authors” required to account for these distributions 


of of able cd was 9,332 for all the 20 journals. 


Next, Mantell smooths these aggregate figures by fitting 
a. Poisson distribution (which fits very well), and thus esti- 


mates that 74.1% of the potential authors did not publish 


at all, 22.2%, published once, 3.376 twice, 0.376 thrice, and 
0.1% foúr times. Applying these rates to an estimated popu- , 
lation of 48510" scientists and engineers, he calculates their 


output at about 14010 titles per annum. 


Even if we accept the assumptions from which the Poisson 


distribution is derived, this estimate is correct only if no 


“authors” publish in more than one journal. If there is 


‘any: overlap, Mantell’s estimate will be too low. This is 
. . easy to show by numerical example. Suppose a population 


of 100 ‘scientists is publi in a single journal, with fre- 
quencies — those observed by Mantell. Then the total 

rs and the average number per scientist will 
fantell estimates them. I 


-— 
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Suppose, however, 
iis papers independently, by Poisson sampling, from the 
same population of 100 scientists, and with the same total 
number of papers per journal. The statistics of publication 
will then be those shown in Table 1. 


Taste 1. Number of authors mentioned, by number of titles 
per author, in each of two “independent” journals 


Journal B 
Number of 
Titles 0 1 2 3 4 


Journal A 
ieee RS 


. We see that 55% of the population will appear in — 
. journal, 8396 wil appear once (in one or the other), 
25--2.5 4-49 — 9.996 will — twice (in one or both), 

and 18% will appear ce. On the other hand, the ana- 

. logue to Mantell’s Table 8 will look like our Table 2 


Taste 2. Analogue to Mantell’s Table 8 


Frequency of contribution per year 


Journal 0 1 2 3 4 
E Mm 2 3 1 
5 74. 32 3 1 
Total - 148 44 6 E 
Percent 74 22 3 1 


He will therefore conclude that each 100 scientists will 
contribute a total of. (22X 1) + (3X2) + (X 8) — 
papers; while from our joint statistics of the two journals 
we conclude that each 100 scientists will contribute & total 
of (33 X 1) + (10 x2) + (2 >x 8) = 59 papers, which, ex- 
cept for rounding errors, ig just twice the previous estimate, 
as it should be. 

Since we do not know how much overlap exists, in fact, in 
the potential author pools of different acientific journals, 
we cannot determine the proper correction to apply to 
Mantell’s estimate. Common observation tells us that the 
overlap is great and UM the errors of estimation must be 
correspondingly 

HERBERT. A. SIMON 


Carnegie Institute of Technology 
Pittsburgh, Pennsylvania 


Dear Sir: 


The Brief Communication by Howard Iker in your 
January 1967 issue describes a technique for solving Boolean 
equations that is essentially a programming technique and, 
therefore, restricted to use within the computer. The com- 


putation of the various sums (S) that would represent the ` 


“conditions of truth” for a given equation is far too difficult 
and time-consuming to Torre of an analyst in a real-life 
search situation. 

I want to make this point because the weighted search 
described in my own SU Communication in your July 
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‘there’ are two journals each. drawing - 





“4 


1966 issue 18 basically a way to search in its own right and 
in some respects is easier for the analyst to use and to code 
than a logical equation. To provide an example: at the 
present time our SDI system is comprised of over 900 
separate profiles, each written as a weighted search. The 
fact that our weighted search can also duplicate the logic 
of Boolean equations.added to its usefulness. It is not and 
was not designed, however, as & e technique for 
solving equations, but as an an technique for writing 
search specifications. The basic reas in point of view 


- between the two Communications is, therefore, plain. : 


As & programming-technique Mr. Ikers system would 
appear not to be able to handle equations in which a given 
term makes more than one appearance. For example, see 
C in the following equation: (A -+ B): — ner 
— ANSWER. C cannot be assigned values of both 2 
simult&neously and it would seem, therefore, ere pe 
technique, as described, is limited i in the equations to which 


it, can be applied. 


W. T. BmANDHORST | 
“Documentation Incorporated 
Bethesda, Maryland 


Dear Sir: 


Mr. Brandhorst's letter raises two issues about my Bref 
Communication in your January 1967 issue: (1) The 
method I described is & "programming technique” while that 
described by Brandhorst is an “analysts technique”; 
(2) The method 1 described will not handle duplicated 
references to a term in a Boolean equation. 

There is no question that the calculation of all possible 
solutions and the truth-value of each for an equation of 
any le is indeed a laborious process; indeed, it is a 
technique designed for a .computer, and the reason I men- 
tioned it, as such, was due to Brandhorst's: statement that 
when using the technique he described, “complicated equa- 
tions can be both difficult and laborious to code. . . There 
is no reason that the program should not accept the equa- 
tion and calculate its own weight assignments. This is now 
being evaluated.” Unless this evaluation has been decided 
negatively, I still would urge Brandhorst to evaluate the 
alternate technique I mentioned. This brings us to the 
second point raised. 

Mr. Brandhorst’s example of a duplicated term equation 
(with brackets added for greater clarity assuming the 
usual priority of “and” over "or" [(A + B): (CE D)] 
mood ) — ANSWER represents a set of five elements. 

gardless of how many times any equation terms are 
repeated, & set of five binary elements may be construed 
in a maximum of 32 ways (including the null-set); there 
are simply no more combinations available. Given the 
method: I mentioned, A — 1, B— 2, C =4, D 8, E = 16, 
each of these 32 sets ‘has, by definition, a unique sum; each 
of these 32 sets may be — for its truth value against 
the-given equation. Accordingly, the equation is false 1f and 
only if S— 0-4, 8, 12, 16-18, or 24; all other results are 
true. : 
The computation of the truth PM for each of these 32 
sets is indeed tedious, but there seems little question that 
the process can be done by a computer and equally little 
question that the implementation of such a set of trans- 
lated equation weights can be handled efficiently for term- 
gear 

I hold no brief for the method as the better of the two 
suggested. What I am suggesting is that it not be ruled’ 
out on the grounds.of difficulty in coding, which can be 
handled by a computer, nor of its inability to handle cer- : 
uu kinds .of equations since the latter does not seem to 

e true. 


Howard P. IKER 

School of Medicine and Dentistry 
The University of Rochester 
Rochester, New York | 


xo | Book Reviews 


2/67-1R Annual Review of Information Science and 
Technology. Volume 1. 1988. Carlos A. Cuadra, Editor. 
American Documentation Institute. Interscience Publishers, 
& di ision of John Wiley & Sons, New York, 389 pp. 


This initial volume of the ADI Annual Review calls for 
kudos to the National Science Foundation, to System 
Development Corporation, to the American Documentation 

te, and especially to Carlos A. — for making this 
needed and useful tool & reality and thus filling one of the 
gaps ¡in the field of documentation (broadly defined). Criti- 
cal ual reviews of the literature are essential in any vital 
field even those who are aware of the reviewers bias for 
communication wil not quibble with this), so it is with 
satisfaction that we may greet the fruition of the lengthy 
efforts of Dr. Cuadra. It also is appropriate that Charles 
P. Bourne and Pauline Atherton are represented in this first 
volume, since their early encouragement led Dr. Cuadra to 
continue his efforts toward this publication. 

Now, what do we have? In Dr, Cuadra’s words, “a con- 
structive review of topics of current interest to users, de- 
sends, gad and students of information systems and services” 

review is provided in 12 chapters, each the 
—— of its individual editor(s). One of the major 
was the determination of these divisions and selection 

(and persuasion) of editors to prepare them, They begin, 
ERE an introduction, with Chapter 2, professional aspects 
(Robert S. Taylor) and Chapter 3, information needs and 
uses| (Herbert Menzel). One of the studies reviewed here, 

io ohn Martyn's "Literature Seaching by Research Scientists" 
(pp. 58-59), points up the use of such a tool as this, plus 
the need for interpersonal communication with others work- 
ing in similar fields. Chapters 4-6 concern technical prob- 
lems: Chapter 4, content analysis (Phyllis — 
5, file organization (W. Douglas ore and 
Chapter 6, automated SEX (Robert T 
Indéxing systems (Chapter 7), despite its 110 references, is 
one'of the chapters eriticized for omission of some good 
work. This, I , is almost inevitable in a work of this 


type. We should be grateful to its editor, Charles P. 


Bourne, for his excellent table, Brief Summary of Experi- 
mental. Evaluation Projects Reported in the Literature 
(pp. 176-170), and also for his list of methodological ques- 
that "would be good research targets for some of the 
i dents in the graduate library schools" (pp. —— 
New. hardware (Chapter 8, Annual Review Staff) and 
man- -machine communication (Chapter 9, Ruth M. Davis) 
followed by three chapters (Chaps. 10-12) on applica- 
tiong: oe 10, system applications (Jordan J. vane) 
up advances in the fields of business, chemistry, 
fru, ed ae law and patents, and the military—the 
r naturally li limited by what information has been re- 
— Vui r 11, library automation (Donald V. Black 
and Earl A arley), is another area subjected to criticism— 
eg. “Why was this paper by Bruce Stewart included and 
not; his other one?”; or, “Why was ISO Planning Memo 
No! 3 not listed?” ete. There are any number of possible 
ree ons, but this points up a problem of which we need to 
be pare that reviews are selective. We cannot sit back 
ortably with the.book at hand and relax, thinking that 
now we have all we need. We do not. We do bave a great 
deal more than we had before, and we need to use and sup- 
— the work to see that this series will be continued. But 
we need to supplement it with things like Documentation 
Abs tracta. and other journal literature, and a constant “nose 
for|news"—those interpersonal contacts that are such an 
important part of communication and awareness in an active, 
is ging field. Here I also want to make a plea for ublica- 
„I know of a handful of good projects recently rought 


1 


i Simmons). j 


to successful completion m the dM field which have not 
been re Ende on to anyone ex — tion. 
knowle and experience should ba. shared with the pro- 


. fession; ut might well bring assistance on some of the prob- 


lems as well as help others who are contemplating similar 


_Investments. 


Which brings us also to Chapter 12, information centers 
(G. S. Simpson, Jr., and C. Flanagan). Of special interest 
here is the listing of *New Services of 1905" (pp. 318-320). 
The last chapter, Chapter 13, national issues and trends 
(John Sherrod), concludes with the statement of belief “that 
a working plan ‘for setting-up a national information system 
e — and technology will evolve in the near future” 

p 

We have progressed to the keys to specific data in the 
text—-the indexes (Pauline Atherton and Stella Keenan). 
These 31 pages serve their p well, with boldface indi- 
cating chapters, and with use cross-references. The Na 
pended Acronym/Abbreviation List is very helpful. 
two small quibbles concern the M’s in the index. I am 
pussled to find the Mc’s at the beginning of the list (before 

“Ma,” “Maass,” etc.) in what I understand is a “hand- 
made” list. This does not follow any standard alphabeting 
fortuna of which I am aware. The other item is_more un- 

rtunate: the name of J. J. Magnino (p. 156) is misspelled 
to “Mangino, J. J.,” and filed as such. However, I think 

those who know his work would glance at the page, as I did, 
long enough to spot his initials several lines farther down. 
(The book as a whole is remarkably free of typographical 
errors; 1 found only four others.) 

To sum up, we have a fine piece of work here, and a useful 
one. It is now up to us to see that the work will be con- 
tinued. Dr. Cuadra is continuing as editor for Volume 2, to 
cover 1966 (calendar year) literature. We need to write 
reports on projects, and send a copy to him, as our part in 
aiding the communication process. We also need to let him 
hear dur suggestions. I have one, which may be shared by 
others who have worked on United States of America Stan- 
dards Institute subcommittees: The Institute’s recom- 
mendation for indexes is that they be in dictionary arrange- 
ment in a single alphabet; I su that this arrangement 
be adopted for future indexes, my opinion, 1965 repre- 
sents the high-water mark of Dr. Cuadra’s contributions to 
ADI and the profession. 

Naranm C. Barrs 
Columbia University Libraries 


` 2/67-2R Looking Forward in Documentation, Papers 


and Discussion, Aslib 38th Annual Conference, Univer- 
sity of Exeter. 1964. 1966, Aslib, London. 109 pp. 


Whether these 17 papers and the discussions that they 
stimulated actually are, as the title purports them to be, 
& perspective of future developments in documentation is 
at least debatable, To be sure, the contents of this volume 
do reveal in rather striking fashion the growing interest in 
and experiences with varying systems of information re- 
trieval as they have developed in Great Britain during 
recent years, and in this respect the compilation stands out 
in marked «contrast to the typical attitude toward docu- 
mentation that characterized British documentalists and 
librarians of only a short time ago. But certainly the work 
reported can scarcely be regarded as pioneering, and much 
that is said in these pages will be well known to an Ameri- 
can audience. 

Of the 17 papers' that comprise this collection, 11 are 


concerned with infans storage and retrieval, 2 th 


library use and user needs, and 4 with primary and sec- 


ondary publication, including a particularly interesting - 


paper by Sir Thomas Sorivenor on the growth of scientific 
literature. This ‘distribution of topics probably represents, 
as’ Christopher Hanson points out in one of the discussion 
sessions, the current balance of effort and mterest among 
documentalists in the British Isles. i 

Our existing, store of information is not materially aug- 
mented by the papers on information retrieval which, as the 
above statistics indicate, make up a substantial portion of 
the entire volume. T. N. Shaw, of the Unilever Research 
Laboratories describes, for example, a coordinate-indexing 
system for company: reports for the organizing of which he 
uses a card sorter. His associate at Unilever, H. East, 
reviews the use of the IBM 1620 computer and the IBM 870 
Document Writing System for the processing of published 
information. Other papers follow the same general e - 
tory pattern. Th. W. te*Nuyl, from Shell International 
Research at the Hague, describes the "l'Unite" system using 
a Texoler Sorter for multiple search. Mannix and White- 
hall, also of Unilever, describe a punched-card indexing 
scheme for organic chemistry, using 4 fragment code that 

es ible the identification of reactions. R. Moss of 
Shell Chemical of London, considers the problems of 
vocabulary control when using Batten (Peek-a-Boo) cards 
and urges the need for further research in that area. . 

R. C. M. Barnes, of the Atomic Energy Research Estab- 
lishment at Harwell, argues inconclusively and without 
visible supporting evidence that computers “can have only 
a anal effect upon the quality of retrieval" J. R. 
Sharp, of British Nylon Spinners Ltd., describes an experi- 
mental index—"'Slic" for Selective Indexing in Combination 
—in a paper that has been subsequeritly published in his 
text, Some Fundamentals of Information Retrieval. J. C. R. 
Yeates, of the Rowett Research Institute, describes an 
index created by breaking down titles into their syntactical 
elements, then reassembling them into a statement that can 
be used as a basic index entry; additional entries are 
produced by permutations of this statement. In devising 
this system, which might be characterized as a slow KWIC 
index, he has identified 15 types of syntactical elements: 
- Substance, Property, Aspect, Process; Agent, Modification, 
Modifier, Contrast, Locus, Environment, Condition, Time, 
Viewpoint, Intention, and Literary Form—all of which 
makes Ranganathan’s famous PMExST formula appear sur- 
prisingly simple.’ 

H. J. Zwillenberg, of the Weapons Research Establish- 
ment at Salisbury, South Africa, finds that the use of 
computers for the preparation of bibliographies is eco- 
nomieally practicable, even for small-volume information 
requirements. C. D. Batty, of the Birmingham: School of 
Librarianship, holds that the primary reason for including 
a consideration of mechanized information systems in the 
curriculum of the library school is “to demonstrate [the] 
use [of such systems] to student librarians who will then 
have relevant experience when later they work with and 
perhaps introduce similar systems.” 

. The topics considered at the conference which were not 
directly ‘related to information retrieval were, in the 
opinion of this reviewer, of considerable interest, especiall 
the one by Sir Thomas Scrivenor, of the Commonweal 


Agricultural Bureau. Sir Thomas points to numerous dis-: 


crepancies in published’ figures purporting to prove the 
existence of a "literature explosion" and advances specific 
suggestions on how best to quantify the true amount of 
scientific publication. Really valid data, he nghtly believes, 
would “convert Cassandra-like prophecies of impending 
-catastrophe into verifiable factual statements.” Barnes, at 
Harwel, finds that inquiries suitable for computer process- 
ing account for only-a small proportion of the total received 
by his information office, “perhaps only nine per cent and 
certainly not more than thirty-five per cent." 
' delays resulting from the time required for the preparation 


of a search plan and the computer search itself “would re- ' 
sult in a service appreciably slower than that provided by- 


present methods.” John Martyn and Mrs. Margaret Slater, 
of Aslib's research department, present some very interest- 
ing, though tentative, conclusions concerning the behavior 
characteristics of users of scientific information in libraries; 
their study should be substantially extended. Gordon Y. 
Craig of the University of Edinburgh, urges publishers of 


urthermore,- 


with & 


scholarly journals to initiate the issuing of abstract cards 
for the articles in their periodicals. F. Liebesny, of British 
Aluminum Ltd., presents Aluminum Abstracis as a laudable 


d d international cooperation in the abstracting field. 


. Morris; of University Microfilms Ltd. writing 
under the intriguing and not too permutable title of “From 
One to Two-Fifty,” -proposes the formation of an interna- 
tional agency to collect and package requests for micro- 
photographic copies of wanted materials. Such packaging 
would, he believes, substantially reduce the costs of micro- 
film requisition. Phyllis I. Edwards, of the Department of 


Biology, British Museum, believes, after a survey of users : 


of abstracting services in the biological sciences, that a. 


British association of science abstracting and indexing ser- 


vices, comparable to that now active in the United States, . 


is badly needed to coordinate and otherwise improve the ^ 


x. 


quality and coverage of such facilities. MN 
Unfortunately, the most interesting papers of all are miss- 
ing from this collection; these papers might have reported 
the tentative results of research in progress at Aslib. But 
the investigations are listed in the present volume only by 
title with a brief note on the aim, method, and progress to 


date of each. One would like to know much more about 


such investigations as: a case study in depth of the infor- 


mation needs of scientists; the demands of users of tech- 


nical libraries; economic factors in I. R. systems; com- 
parative tests of small and large systems in the same 
subject field; and the fabrication of model indexes. All 


these topics are of extreme interest to American librarians 


and documentalists, and their published results will be 
eagerly awaited. 
If the reviewer may be pardoned a nationalistic note, he 


would like to point out that, though the book under 


review is not heavily documented, of the total of 42 cita- 
tions it records, exactly half (21) are from American sources. 
These remaining are divided among England (11) and other 
countries (10). Such statistics may have no meaning what- 
ever, though they might seem to suggest that the British 
are at least keeping a watchful eve on the documentation 
efforts of their American cousins. But whatever conclusions 


^ 


one can or cannot draw from such a count, the fact remains - 


that the librarians and documentalisis on John’ Bull’s 
Island have made substantial progress in the past decade 
in developing unconventional approaches to the subject 
analysis of library and bibliographie materials. We can 
well remember the conference sponsored by: Aslib at 
Dorking in 1957, when the group, by fiat, declared that any 
mention of computers and mechanization must be limited 
to "Thursday morning when the Americans can have their 
say.” The mention of machines at any other time: was 


‘strictly verboten. One cannot but wonder whether the slow 


entry of the British into mechanized retrieval activity has 
been due to canny ‘native caution or inadequate financial 
resources. i 

| : J. H. SHERA 

School of Iabrary Science 
Western Reserve University 


2/67-3R Information Management in’ Engineering 
Education. (Proceedings and recommendations of the Con- 
ference on Information Sources, Systems and Media in 
Engineering Education, held at Lehigh University on May 
19-20, 1966). 1966. Robert S. Taylor, Editor, Lehigh Uni- 
versity, Bethlehem, Pa. 76 pp. 


Surely no one is apt to criticize the goal of this confer- 
ence, for the object was to develop a national plan for 
improving the utilization of technical Information by engi- 
neers. The main objective was to produce better courses 
for engineering students in which they would learn to 
appreciate and evaluate the many facets of such informa- 
tion. However, not everyone will agree with the general 
makeup of the body of participants at the conference or 
the recommendations in the plan they produced. 

The conference was jointly nsored by the American 
Society for Engineering Education and by Lehigh Univer- 
sity, The report lists only 23 participants. Aside from’ the 
editor and one or two others, the participants seemed to be 
either professors of engineering or from technical informa- 


tion departments (such as computer centers and the like) 


in industry or universities, - The term “information man- ` 


agement” was coined by one of the participants to mgnify 
the understanding of and utilization of information sources, 
medi — The participants obviously were not 
conte "ne — over the problem of better utiliza- 
tion of de or they came up with a very detailed 
plan of action that included at least three pages of flow charte 
and erous lists and categories. In. fact, it was this very 
amo t of detail that made this report so unlike the usual 
conference proceedin Aside from the account of the 
introductory talk by e conference chairman, the remainder 


of th report reads like the final document of & study com- , 


mision; with summaries and a thorough discussion of all the 
aspec sofa problem. There is also a two-page list of refer- 
ences} which was given the rather unusual listing in the table 
of contents as a ‘suggestive bibliography.” One fault with 
the references was that so many were to unpublished litera- 
ture. [But, in general, one admires the goal of the group and 
their diligence ` in preparing such a comprehensive plan. 

There was a general summary plus the reports and recom- 
mendations of four panels, The introductory talk by the 
chairman gave background information on the relationship 
of engineers to information utilization and also listed sev- 
eral questions for the conference to answer. -The panel on 


Objectives described the outline of their national plan, 88 


well discussing proposed courses for the new project. 
The panel on Curriculum elaborated on the contenta of the 
courses, including a comparison of the ' “problem” method 
of presentation versus the “formal” method. The panel on 
Logisties and Implementation gave more of the details of 
the national plan, including summer training institutes for 
kw instructors of the courses, setting up task forces for 
ta g the curriculum, etc. "The panel on Support and 
uon dealt with the establishment-of a permanent 
secrefariat, with funding (at least $1,300,000 for a three- 
year i rogram) and various evaluation techniques. 
Regardless of the outcome of this particular program, it 
is at} least enco to see such interest by engineers 


and — school faculties in improving these condi- 


tions. However, one aspect of the conference that seemed 
to mo to be regrettable was th apparent lack of participation 
by those now e ed in the teaching of courses in this 
— for engineering students. Courses have been given 
for man Abs at schools all over the U. 8. Did the con- 
ference ve the benefit of the experience of any of these 
PCS or even of a spokesman for the group? If so, 
ot &pparent from the list of participants. Surely some 
— benefits could have been derived from hearing about 
theirisuccesses and failures in actual classroom work. 
other point that bothered me was the recommendations 
for instructors for the new courses. It appears that engineers 
on the regular engineering school faculties will be the 
chosen ones. 
devoted, at considerable length, to summer training insti- 
tutes for preparing the engineers for the teaching. ai the 
new courses. This seems again to overlook the contribution 
that ¡could be made by those now engaged in such work. 


Granted that the existing courses are not perfect, it would. 


to me most likely that present instructors, if they 
buy any competence at all would be pleased to work 
towards improving the course content and methods of 


presentation, and that this upgrading could be done more- 


quickly and more easily than starting more or less from 
scratch with the recruits to go to summer training institutes. 
Almost any engineering school worth its salt should have 


technical i eii and documentalists and data processing : 


managers who, Or — could do a very. creditable 
job o teaching such courses. Many of these people have 
had hollegiate training, or even degrees, in the sciences and 
engineering as well as training in their library or documenta- 
tion| duties. It is not unusual to find librarians who are 
familiar with all the latest. trends in using computers, or 
documentalists who are familiar with the reference tools 


used in literature searching, etc. On the other hand, engi-: 


us , unjustly or not, do not enjoy & good reputation a8 
very familiar with nical information resources in 
gendra] and as a class are not usually rated as being “heayy” 
users of this information. Why not use the people whose job 
it is to organize, select, use, and manipulate technical in- 
ation on a daily basis to serve as the instructors of iis 
proved courses? 


t 


For this reason, one part of the plan was ` 


This ‘report. may very. well be followed by a galo! 


- improvement in -the utilization of technical information. 


The improvement is long overdue, But let us hope that it 
is done with-the active participation of those who are ex- 


 perieneed already in such matters, Librarians and docu- 


mentalists have been concerned about this problem for 
many, many years, Why not put them on the team too? 


Erus Mount 
Science & Engineering Jabranan 
Columbia University 


-2/67-4R Education and Training of Information Spe- 


cialists in the U. S. A. May 1966. Marilyn C. Bracken and 
Charles W. Shilling. Biological Communication Project, 
Washington, D. C. 70 pp. 


This is a brief survey of the current status of information . 
education, which proves to be a larger and more varied field 
than most readers will realize. The field is interpreted 
broadly to include some computer instruction, often ad- 
ministered centrally in the computer center, as well as some 
traditional library science course work. Fifteen doctoral 
programs are included, but the possibility of getting.a strong 
concentration in information science is limited to only & few 
of them, most of them offering little more than library 


` science. There is a list of the ALA-approved library schools 


with an indication of the information science courses 
offered. Another section of this paperbound booklet de- 


— the programs of 20 leading schools in two or three 


pages apiece. The faculty lists are sometimes. helpful but 
include many persons only remotely concerned with infor- 
mation science education. Another table lists professional 
associations and their interest in information science educa- 
tion. A bibliography of recent information concludes the 
booklet. 

A few of the items of information are either incorrect 
or misleading, but in general this is & useful compilation that 
will go out of date quickly. While it is current, those for 
whom superficial factual data on the field is important will 
need it in their offices. 


JOHN Harvey, Dean 
Graduate School of Library Science 
Drezel Institute of Technology 


2/67-5R A Plan for Indexing the Periodical Literature 
of Narsing. Report of a Study of the Need for Biblio- 


. graphic Control of the Scholarly Record of Nursing. 


1966. Vern N. Pings. American Nurses Foundation, New 
York. 202 pp. 


The serial literature of nursing is indexed by two organi- 
zations. This monograph recommends that one of these 
two continue indexing such literature, and that other organi- 
rations interested in the literature of nursing cooperate with 
it in disseminating the results. A plan for the National 
Library of Medicine routinely to index the nursing litera- 
ture found in serial titles coverd by Index Medicus is suc- 
cinctly stated in part of the book. The rest of the book is 
devoted to background on the history and development (or 
lack thereof) of nursing librarianship and bibliographic con- 
trol, accompanied by statements on both nursing education 
and the profession. 

In actual fact, the plan ——— is already in operation. 
In the book, it ‘would be elpful if the justification of the 
need for the particular index plan recommended were as 
strongly presented as is the conclusion that the index is 
needed and that this pear plan for accomplishing it 
through the facilities of the National Library of Medicine’s 
generalized system is the means that best serves those inter- 
ested in the nursing literature. Comparisons of an analysis 
of the products of the Cumulative Index to Nursing Latera- 
ture with that of the National Library of Medicine are 
made. Not sufficiently emphasised is the importance of 
NLM's routinely screening the journals covered by Index 
Medicus over the fact that NLM uses a computer in manip- 
ulating the citations. Only & beginning 1s made on enu- 
merating the inherent and administrative drawbacks of the 
operational computer system used. There was a. technical 
oversight in calling grace (Graphic Arts Composing Equip- 


ment) graphic arts company equipment, . which ui : 


that Us NLM system was perhaps not observed closely 
enou 

The monograph is — when presenting baokgro 
information on nursing ip-and bibliographic con- 
trol,.and in its many ables of date illustrating the quantifi- 
cation and categorization of data das fellows to ae subjects 
covered in the various chapters, titled as fo 


1. The Quest for Quality in Nursing. 


2. A Review of the Efforts of Professional Groups to 


Achieve Bibliographic Control of the Literature of 


Nursing, 1900-1964. 
3. Analysis of the Coverage of Numine Literature in 


a Medical Literature Analysis and —— Sys- I 


— of Indexing in Cumulative: Index to 
Nursing laterature with Indexing in MEDLARS 

. Quantitative Characteristics of Journal Literature of 
Interest to Nurses 

. The Possible Users &nd Purchasers of &n Index to 
the Periodical Literature of Nursing 


of Nursing Utilizing wxpLAns Facilities 
. Possible Administrative Relationships among Agen- 


oo "qo Gt > 


und : 


. Cost of Publishing an Index to the Journal Literature - 


cies now Concerned with the Bibliographic Control 


of the Journal Literature of Nursing 
9. A Review of the Published Literature on Nursing 
Libraries, 1900-1963 
10. Access to the Scholarly Record of Nursing | 
11. Study of a Plan for Indexing the Periodical Litera- 
ture of Nursing 


The merit in and the importance of the book lie. in its 
example of an approach to — and evaluating infor- 
mation systems, and in its revealing the condition of nursing 
bibliographie control complete i extensive references. 
These alone more than justify its purchase and 
addition, it records the bac und that led to ie coopera- 
tive effort between the National Library of Medicine and 
the American Journal of Nursing Co. in producing the 
International Nursing Index, which may set the attern for 
similar ventures in other medical and parame 
where NLM is indexing litérature and the specialty field 
concerned is — hard piene to publish and index 
the literature for its area. | 


Hayom GARCIA CLARK 
Librarian, Chestnut Lodge 
Rockville, Maryland 


2/67-6R Indexing and Classification ; A Selected and 
Annotated Bibliography. A joint project of the Nuclear 
Science Division and Documentation Division of the Spe- 
cial Libraries Association, May 14, 1966. Compiled and 
edited by Winifred F, Desmond and Lester A. Barrer. 


Oak Ridge National Laboratory Library, Oak Ridge, Ten- ` 


nessee. 256 pp. (on microfiche) 


Two divisions of the Special Libraries Association have - 


here joined together to produce the first SLA. publication 
in microfiche form. The 356 typescript pages of this an- 
- notated bibliography of recent publications in the fields of 
indexing and classification are contained on six 4 X 6 micro- 
fiches. ` 


The 635 entries P. this bibliogrsph are intended to cover 


- the field fairly comprehensively, Indexing and classification: 


are interpreted quite broadly to include general books on 
data processing and information retrieval. 

iven for technical reports, monographs, periodicals, con- 
erence proceedings, ete. 

But the scope 18 somewhat confusing, The list is called 
a selective bibliography because of the omission of classifi- 
cation schemes, glossaries, and vocabularies. But a few 
outstanding examples of this 
samples. The time period covered is January 1, 1960, to 


cal. fields - 


latter group are included as. 


perusal. In . 


Y 


Citations are  . 


mid-1964. But a few "publications of the 1950'g are included ` 


because of their special significance. The citations are 
essentially to English-language publications. But a few 
items in other. innguages c are included. With all 3 these 


/ 


5 £ 


ie. indexing. 
valuable "BIO and has then 


exceptions, it is probably worthwhile to look'for almost 
any important publication on indexing of the last 15 years 
or go in this bibliography, even if it falls outside the formal 
limitations. 

` The entries are arranged alphabetically within a broad 
classified order. Reports are entered under the organization 
responsible for the research. Journal articles and mono- 
graphs are entered under personal author, corporate author, 

or title. Papert in proceedings are also entered’ under’ 
author. 

Information is supplied about the availability of technical 
reports. When & report is available through the Clearing- 
house for Federal Scientific and Technical Information, the 
price is given. 

The. annotations for each entry come from American 
Documentation, Computing Reviews, Computer Abstracts, 
and a few other journals. When no annotation was avail- 
able in the literature, the editors provided one of their own. 
Beneath each abstract are listed the indexing terms as- 
signed to the document. 

Some interesting statistics on the nature of indexing liter- 
ature are provided im the Introduction. A frequency count . 
was made for the types of sources of entry included in the 
Bibliography. Journal articles led with 229. Conference 
literature followed with 146; report literature was close 
behind with 125. Of the 61 journals that contributed en- 
tries, American Documentation led with 43 articles. The 
Journal of Chemical Documeniation followed with 38. No 
other periodical source came even close. 

The most potentially interesting (and ultimately dis- 
appointing) feature of this biblio eae? is its set of indexes. 
Besides a computer-produced au index (enriched nianu- 
ally when authors were not main entries); two subject in- 
dexes are provided. One is a computer-produced KWIC 
permuted title index; the other is a manually-produeed 
subject index. The Preface describes this as an opportunity 
to make a comparison between computer KWIC indexing 
and manual keyword indexing: 

It is not clear what sort of eomparison was really in- 
tended. Manual techniques ean achieve more sophisticated 
indexing than is now possible with a computer. The only 
valid points of comparison would seem to 5 be relative times ` 


` and costa for similar levels of indexing. But no such figures’ 


are provided. 

What is more disappointing i is that neither index is really 
adequate. The xwic index falls into many of the expected 
traps. Words like “indexing,” “information,” and “retrieval” 
appear in great abundance, filling up pages without being 
much help to the user of a bibliography on indexing and . 
information retrieval. There is no vocabularly control, 
which means there is much filing under synonyms and under 
different forms of the same word, e.g., "index," "indexes," 
and "indexing." The only aid given the user is the context 
of the title in which the term appeared. 

The manual index would have served the purposes of 
comparison if it had simply shown’ how far our machine 
techniques must still be developed before they can achieve 
the quality of good manual indexing. But what we have 
here ig not good manual indexing. The subject terms are 
drawn from the titles and the abstracts. Thus,,we are 
given a number of subject entries missing from the per- 
muted title index. But we again find entries like “indexing” 
and “automatic information processing” followed ‘by long 


- lists of indentifying document numbers. There isn’t even 


any context here to serve as some sort of aid in diferentia- 
tion. 

In the case of the -xwro index, wé can say that the 
machine does not know any better. But there is no excuse 
for including the term “indexing” in a manual index devoted 
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EDUCATION FOR INFORMATION SCIENCE, proceedings of a symposium held Sep- 
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ROGRESS IN INFORMATION SCIENCE AND TECHNOLOGY $ (Volume III, 1966). 
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DOCUMENTATION ABSTRACTS, |] a quarterly abstract journal designed to be a compre- 
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| 2000 P Street, Northwest 
WASHINGTON, D. C. 20036 
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The 1967 Annual Convention of the American Documentation Institute will be held in New York City at the 
Hilton Hotel, October 22-26, 1967. The convention's theme is | 





LEVELS OF INTERACTION BETWEEN MAN AND INFORMATION 


We invite you to present a short paper which will amplify our theme by reporting on related and Bignificant 
SCORES trends and achievements. Papers should, if possible, conform to one of these outlined topics: 


THE CREATOR OF INFORMATION 
. the creative writer, the scientist, the graphic artist, the editor and the publisher. 


THE USER OF INFORMATION 
. uging information in the business world, ads interface, psychology and 
information, language and information. 


THE HANDLER AND PACKAGER OF INFORMATION 
. traditional methods of organizing and storing information, new approaches to 
organizing information, wares of information services (indexes, catalogs), information 
sharing -- the emerging network, 


The short paper should not exceed 2000 words. Text, illustrations, bibliography, etc. must fit on five 8-1/2 x 11" 
typewritten pages. Detailed instructions for camera-ready copy will be sent on recelpt of the attached reply form. 


Authors will be grouped by topic for panel discussions and,allowed to give ten-minute precis of their 
papers. All selected papers will be printed and distributed at the convention. 


E ew 
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DEADLINES: JUNE 1, 1967 
The Program Chairman should be notified by June 1, 1967 of your intent to 


submit a paper. Use attached form. 
JULY 1, 1967 

Short papers should be submitted to the program chairman by July 1, 1967. 
AUGUST 1, 1967 

Authors of selected short papers will be notified of selection by August 1. 


SEND REPLY TO: Paul Fasana, Program Chairman 
ADI 1967 Annual Convention 
Columbla University, The Libraries 
New York, N.Y. 10027 


REPLY FORM - ADI 1967 CONVENTION 


` To: Paul Fasana I plan to submit a short paper in the following subject area 
The Libraries 
Columbia Univ, 
New York NY 10027 
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Consolidates the latest developments in the growing field 


of inquiry concerned with information. 


Annual Review of Information Science and Technology 


American Documentation Institute 


Volume 1 


Edited by CARLOS A. CUADRA, 
System Development Corporation. 


This series is intended to be of tangible benefit to 
individuals, public and private organizations, uni- 
versities, and government agencies. Although the 
field cuts across numerous disciplines, the people 
involved in it all share a concern with the genera- 
tion, transformation, communication, storage, re- 
trieval, and use of information. This volume, and 
the ones which will follow it, are planned as con- 
structive reviews of topics of interest to users, de- 
signers, and students of information systems and 
services. By providing a perspective on the infor- 
mation field, such reviews not only point out gaps 
and duplicative work but also serve as sources of 
new ideas. ; 


A long-range goal for the ADI Annual Review is 
to encompass the larger communication process 


a mm“ 


of related 


National Document Handling Systems 
for Science and Technology 


By LAUNOR F. CARTER, GORDON CANTLEY, JOHN 
T. ROWELL, Louise SCHULTZ, HERBERT R. SEIDEN, 
EVERETT WALLACE, RICHARD WATSON, and RoN- 
ALD E. Wy Ltys, all of the System Development 
Corp., $anta Monica, California. 


This new book is the result of the COSATI- (Com- 
mittee on Scientific and Technical Information) 
sponsored study on national information systems 
relating to scientific and technical documents. 
Presented in a clear and well organized manner, 
the emphasis of the study, as stated by COSATI, 
13 — 
1. Initial and primary priority will be placed on 
national systems relating to scientific and tech- 
nical documents, their handling and the man- 
agement of such documents. 


2. Secondary attention will be given to develop- 
ment of programs which can be undertaken 
with government support for identifying, ana- 
lyzing, and giving a structure to the total flow 
of scientific and technical information in this 
country. 


1967 344 pages “$9.95 


in which documentation plays such an important 
role. To this end, the authors have made an effort 
to examine not only the obvious literature but also 
that in psychology, sociology, communication, 
engineering, management, business, and other 
fields that have a significant bearing on the com- 
munication process. The Annual Review will at- 
tempt not merely to reflect current interests; it 
will also attempt to broaden and deepen them. 


ADI’s Annual Review will be keyed to the calendar 
year. The present volume covers, for the most 
part, literature that appeared in the calendar year 
1965. Some of the earlier literature is also re- 
viewed because this volume, as the first in the 
series, had no prior context to serve as a frame of 
reference. Volume IL which will appear in the 
fall of 1967, will cover 1966 literature. An Inter- 
science Series. 1966 389 pages $12.50 


A discount of 15% is available to members of the 
ADI if volumes are ordered through the Institute. 


interest . .. 


The Analysis of Information Systems 


À Programmer's Introduction 
to Information Retrieval 


By CHARLES T. MEADOW, IBM Corporation. 


Treats information retrieval as a communication 
activity among a user, a library, and an author. 
The author assumes some computer background. 
1967. 301 pages. $11.50. 


Growth of Knowledge 


Readings on Organization 
and Retrieval of Information 


Edited by M. KOCHEN, 
The University of Michigan. 


A selection of essays on information retrieval, 
stressing the importance of evaluating and synthe- 
sizing newly generated knowledge into a coherent 
overall image. (44 volume in the Library of Be- 
havior Science Series). 1967. Approx. 368 pages. 
$12.95 ` 


John Wiley & Sons, Inc. 
605 Third Avenue, New York, N. Y. 10016 


unique information services 


"1. ASCAS (Automatic Subject Citation Alert)—our com- 
puter searches the literature as fast as it appears and 

-- alerts you each week to specific items relevant to your 
interests, y 


2. Science Citation Index8—for the period indexed, tells 
what works tite specific earlier works, providing retro- 
spective searching. Published quarterly; cumulated annu- 
ally. Available for .1967;°1966, 1965, 1964 and 1961. 


3.4. 5. Current Contents®—your weekly guides to what’s 
appearing in more than 1,600 domestic and foreign jour- 
nals. Published in three editions: Physical Sciences, Life 
Sciences and Chemical Sciences. : à 


6. Index Chemicus -—weekly graphic €— and 
indexing service for researchers who need fast, accurate 
and thorough reports about new BEREITS compounds and 
their syntheses.. | : 


7. Encyclopaedia Chimica Internationalis? cumulates 
Index Chemicus™ yearly with specialized rapid-search | 
indexes for retrospective searching. Volumes’ for 1966, 

1965, 1964, 1963, 1962-63, 1960-62 available separately , 
or as complete 23 -volume set. 


8. ISI Search Service—when iomal problems di 
up your work, personalized searches by ISI information 
scientists bring fast, pertinent. answers. 


ISI Magnetic Tapes — delivered weekly for use in your 
‘own information system to search the. most comprehen- 
sive literature. file available anywhere. 


10. OATS™ (Original Article Tear:Sheets)—one-day deliv- 
ery of the original journal pages of any article i 
abstracted or indexed by any is services. 
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> $ AMERICAN DOCUMENTATION - 


INSTRUCTIONS TO AUTHORS - 


> Documentation is a publication of the Ameri- 


` can Documentation Institute. It is a scholarly journal in the ^ _ 


vürious fields in documentation and sérves as a forum for 
discussion and experimentation. Papers already published or 


in press elsewhere are not acceptable. For each proposed - 


. contribution, one original and two copies (in English only) 
‘should be mailed to Mr. Arthur W. Elias, Editor, Amen- 


=- can Documentation, Institute for Scientific Information, 
325 Chestnut St., Philadelphia, Pennsylvania 19106. The i 


manuscript should be mailed flat in a suitable-sized en- 
velope. Graphic materials should be submitted with suitable 
cardboard backing. 


Types or MANUSCRIPTS: Three types of contributions are 
considered for publication: full-length articles, brief com- 
munications of 1,000 words or less, and letters to the editor. 
.Letters and brief communications can generally be pub- 
` ished sooner than full-length manuscripts. Books, mono- 
. graphs, and reports are accepted for critical review. Two 
copies should be addressed to the Review Editor, Dr. 
T. Hines, 54 North Drive, East Brunswick, New Jersey. 


. Processina: Acknowledgment will be made of receipt of . 


all manuscripts. Amertcan Documentation employs a re- 
. viewing procedure in which all mansueripts are sent to two 
. referees for comment. When both referees have replied, 
copies of their comments are sent to authors with the 
Editor’s decision ns to acceptability. The refereeing pro- 
cedure requires about 30 days. Authors receive galley proofs 
with a five-day allowance for corrections. Standard proof- 
‘reading marks should be employed. Reprint order forms are 
forwarded with galleys. 


Format: All contributions should be typewritten on white - ' 


bond paper on one side only, leaving about 1.25 inches (or 
3 em) of space around all margins of standard, letter-size 
(8.5 X 11 inch) paper. Double spacing must be used through- 
out, including the title page, tables, legends, and references. 
The first page of the manuscript should carry both the first 
and last names of all authors, the institutions or organiza- 
tions with which the authors are affiliated, and notation as 
to which author should receive the galleys for proofreading. 
All succeeding pages should carry the last name of the first 
author in the upper right-hand corner (0.5 inch from the 
top) and the number of the page. 


STYLE: In general, style should follow the forms given in 
the Style Manual for Biological Journals (SMBJ), published 
for the Conference of Biological Editors by the American 
Institute of Biological Sciences (1964). i 


'TrrLE: The title should be as brief, specific, and descrip- 
tive as possible. Vague and unrevealing titles may delay 
publication. | 

ABSTRACT: An informative abstract of 200 words or less 
must be included, typed with double spacing on n separate 
sheet. This abstract should present the scope of the work, 
methods, results, and conclusions. 

ACKNOWLEDGMENTS: Financial support may be listed as 
a footnote to the title. Credit.for materials and technical 
assistance or advice may be cited in a section headed 


“Acknowledgments,” which should appear at the end of: 
the text. General use of ——— in the text should be ` 


. avoided. 


GRAPHIC MATERIALS: American Documentation. requires 
finished artwork. Follow the style in current issues for lay- 
^out.and type faces in tables and figures. A table or figure 
Should be constructed so.as to bé completely intelligible 
without further reference to the text. Lengthy tabulations 
of “essentially similar data should be avoided. 


Figyres should be lettered in black India ink. Charts 


drawn i in ind ink should be so cud — with 


no typewritten material included. Letters and numbers ap-. 


pearing in figures should be distinct and large enough so 
that no character will be less than 2 mm high after reduc- 


tion. A line 0.4 mm wide, reproduces satisfactorily’ when - 


reduced by one-half. Graphs, charts, and photographs should 
be given consecutive figure numbers as they will appear in 


the text; however, figure numbers and legends should not E : 
appear as part of the figure, but should be typed double 


spaced. on a separate sheet of paper. Each figure should be 
marked lightly on the back with the figure numher, author's 
name, complete address, and shortened title of the paper. ` 


For figures, the originals with two clearly legible repro- , 


ductions (to be sent to referees) should. accompany the 


manuscript. In the ease of photographs, three glossy prints 


are required, preferably 8 X 10 inches. 


ORGANIZATION: In general, papers should state the back- 
ground and purpose of the study, followed by details of 


methods, materials, procedures, and equipment. ‘Findings, . 


discussion, and conclusions should appear in that order. 
Appendixes may be employed where appropriate for ex- 


. .tensive lists, statistics, and other supporting data. 


BIBLIOGRAPHY: Accuracy and adequacy of the: references 
are the responsibility of the author. Therefore, literature 
cited should be checked carefully with the original publica- 
tions, Referencés to personal. letters, abstracts of verbal 


reports, and other unedited material may be included. If. 


3» 1 3 


an as-yet-unpublished paper would be helpful in the evalua- ` 


tion of `a manuscript, it is advisable to make a copy of it 


available to the Editor. When a manuscript is one ofa 


series of papers, the preceding member of the series should 


be included in literature cited. 
CITATION FORMAT: 


Order: Literature cited should be sequentially numbered m 


as cited. 
- Authors: Give all authors with arrangement as follows: 
Elias, A. W., B. H. Weil, and I. D. Welt 


_ Titles: Give full titles of articles in English, e ii 
language of original as: (In Ger.) | 


„Journals: Journal titles should be given in full. 


. MoNOGRAPH AND SERIAL Data: 
order as follows: Volume, issue number, pagination,. and 
year. The issue number should be given in parentheses if 
journal pagination is not continuous from issue to issue. 
Pagination should be inclusive. Year of publication should 
be given in parentheses. An example is given below: 


Bishop, D., A: L. Milner, and F. W. Roper, Publication - . 
Patterns of "Beientifie Serials, American Documentation, . 


16 (No. 2): 113-21 (1965). 
American Documentation is published in January, Koal 


July, and October. One copy is included in the individual - * 


membership fee ($20.00 per year), three copies in the con- 


copies in the sustaining membership fee ($500.00 per year). 
Nonmembers may subscribe at $18.50 per year, postpaid in 


' the US. Single copies may. be purchased for $4.65 each. 


Communications concerning memberships, subscriptions, re- 
prints, renewals, back issues, advertising, and: changes of 


"address should be sent to the American Documentation 


Institute, 2000 P Street, NW, Washington, D. C. 20030. 
American Documentation is indexed in Library Iatera- 


ture, Current Contents of. Space, Electronic & Physical ` 
. Sciences, Library Science Abstracts, Science Citation. Index, 


Chemical Abstracts, and Documentation Abstracta. 


American Documentation is entered for second class mail- 2 


ing at Baltimore, Maryland. 


Should be presented in ` 


l tributing membership fee ($100.00 per year), and up to five - 
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Editorial 


Three new projects are.scheduled for American Documentation for 1967 and 1968. All 
are designed to keep ADI members and AD subscribers in the closest possible contact with 
our rapidly expanding field while improving the quality of the journal. | 

The first project was suggested by Dr. Eugene Garfield and represents a logical exten- 

` gion of our refereeing procedure, coupled with the use of these pages as a forum for dis- 
cussion and review of ideas. Up till now such discussion has been limited in length to that 


appropriate for our Letters and Brief Communication sections. Now; the Editor actively = 


solicits your contribution of Opinion Papers. These contributions will have the same treat- 
| ment and format as regular articles, but will allow for extensive discussion of an opinion- 

ated type: The first of these is scheduled for the October issue and, we hope, will set the 

pace for & succession.of analytical treatments, challenging and provocative in nature. 

Continuing this concept, the second project was suggested by the Central Ohio Chapter 
in connection with.their host responsibility for the 1968 Annual Meeting. Authors of. all 
papers published i in 1968 (Issues 1-3) will be invited to a special Author Forum at the 
Annual Meeting. At this forum the published papers will be laid open to further examina- 
tion, question, and discussion, a procedure designed to be of great interest to authors and 
Editor alike. 

Finally, the Editor A already dd and earnestly solicits papers on Copyright. sid 
Documentation. These are to be published either as a separate issue or as a continuing 
series for AD in 1967 and 1968. All viewpoints are welcomed so that this increasingly 
important topi may be nd examined by the AD readers. | 


A. W. Eras 


Watson Davis (1896-1967) 

. We regret to record the passing, on ' 
June 27, 1967, of Mr. Watson Davis. 
Mr. Davis was the founder of the. 

, American Documentation Institute 
and its President from 1937 to 1947. 
Às scientist, journalist, and doéu- 
mentalist his honors and atwards.are 
too numerous to mention. We mourn 
the loss of & pioneer of our pro- ' 
fession. 
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The point is that libraries are collections of files, files 
that have to be updated and manipulated so that you can 
search them from various angles. No one can challenge 


the fact that fle manipulation is a task of the sort that - 


EAM and EDP devices are good at, providing someone 
can write instructions for them. 

However, libraries are evolving rapidly beyond the 
status of passive archives that scholars browse in. In 
science and technology, they are challenged with an ever- 
increasing urgency. At one moment someone wants a 
€opy of a document that he has already accurately identi- 


fied. At the next, semeone else needs a list of report or 
article titles relevant to a pr d. Half an 
hour later, the need may be for a degree o rance 


that there’s nothing in the files that already answers a 
question that has come up. 

It is probably because of the particular suitability of 
electronic data processing to file manipulation that not 
too long ago—maybe ten years—there was common con- 
viction that automation would take care of all this. There 
seemed to be no reason why you couldn’t eventually put 
scientific or engineering information in a machine’s mem- 
ory, as that information was generated, and pull out 
whatever you wanted whenever you needed it. We now 
recognize that we cannot yet organize files and inquiries 
so as to realize this ideal. 

Evan Herbert, Associate Editor of International Science 
and Technology, put it this way in a recent article (3): 
“New ways of manipulating data can give instant access 
to networks of files, but retrieval of information still 
hinges on the transfer of meanings.” 

This brings us to consideration of where we stand today 
in the evolution toward the fullest possible application of 
computer technology to libraries. For purposes of discus- 
sion, let us divide this evolutionary process into three 
phases. 

The first phase is automation of the files simply to re- 
place manual operations with machine operations in per- 
forming library tasks of conventional sorts. 

Tremendous strides have already been and are in the 
process of being taken toward automating conventional 
library operations. Punched-card, EAM systems have 
long since been adopted by many, many libraries in this 
country and abroad for a variety of purposes—control 
over the ordering, receipts, cataloging, and routing of 
documents, monitoring of lending and recovering them, 
production and arranging of card catalogs, and the like. 
Many of these HAM systems are presently being con- 
verted to EDP. ' 

More recently, more sophisticated automation efforts 
have been undertaken. Each of the three National li- 
braries—agricultural, congressional, and  medical—has 
completed extensive studies of automation possibilities 
and automated various operations to varying degrees, 'The 
Library of Medicine, for example, produces Index Medi- 
cus from machine-stored, sorted, and printed records. 


The National Agricultural Library is embarking on a : 


broad program of mechanization. The Library of Con- 
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gress already turns out its listing of new serial titles from| 
an automated file, and has recently embarked on a pilot 
study (sponsored by the Council on Library Resources) | 
of the feasibility of making catalog data in machine-: 
readable form available to all libraries. 

Actually, this kind of automation is so widely and. 
rapidly being introduced as to outstrip anyone’s ability 
to deseribe the present state of affairs accurately. The 
Documentation Division of SLA and the Library Tech- 
nology Project of ALA are currently engaged in a survey, 
of what normal library functions, such as “payroll ac- 
counts, catalog card production, KWIC indexes, seri 
records, union lists, circulation control, and current 





„awareness services,” individual libraries have automated 


of plan to automate. When the results of this survey are 
in, we will know better where we stand; but there is no 
question that the first phase of computer application to 
libraries has been completed in the sense that the know- 
how has been developed, even though problems of inter- 
change of machine records remain and may even be 
exacerbated by the present uncoordinated stamped 
toward computers. 

The second phase of library —— is the one in 
which we are today. It involves the question of which 
documents in the file contain information on a specific 
subject. In the first phase, we have automated our in- 
ventory control over packages of information; in the 
second, we are trying to automate the process of deter- 
mining which packages in the inventory contain answers 
to particular questions. Like the bibliographic inventory 
control just referred to, this process of identifying — 
by information content is something that librarians ha 
been doing for years. However, the enormous increase if 
the body of knowledge, the increase in variety of uses for 
it, and—particularly in science and technology—the de- 
mand for speed and flexibility of retrieval have overtaxed 
the conventional man-based systems. After all, the file 
in which the relevant items are sought may be a huge one 
of 109 items or more—and note that the word’ here is 
“stems,” not “bits.” 

But the effort is still one of — conventional 
operations on the files, although in the interests of getting 
the most possible use out of machines many innovations 
in file organization have been tried. The best known of 
these, of course, is coordinate indexing. 

The snag that has caught us in this phase is the “trans- 
fer of meanings" difficulty referred to by Herbert. Viewed 
from a slightly different angle, it is also called the “natural 
language" problem. Many automated systems have sub- 
ject as well as bibliographie tags attached to the infor- 
mation in their files, but subject searching by unaid 
computers is so far considerably less than perfect. At 
the moment, it seems more important to identify what 
kinds of aid computers ean offer in finding what kinds of 
information: than to try to turn the whole job over to 
computers. The emphasis is on machine-aided rather than 
ali-machine systems, and possibly it will stay there for a 
long,long time. ' 


(004 t ^ 7 
- a t - ` . * t t $ 
* . 
+ . ¿ T ` . ; N * 
w oe - E . ° * * & $ H 3 : i e t 
a 1 z . ' h 1 ha - 
. s. ^. "ES 15. 1 à — E ME 
* Lum š À ] aw! 4 “o. t ` 
— — ENS MS i " ` . 4 = 
: x t , - a atte . tea ` 
. a ba 1 à 3, 5 
r — E * * d - — * — 
` = ` E n - ' fe $ ` s ` = — 
` ` ' - ` 
* ~ e ` * ` 
1 ` ‘oe ) ' b ^ r , 
ta 3 r » . T i r 
4 


` ` 
4 E 


` The application of mue to library Sperations is 


- discussed in broad. terms. Three phases in automation . 
the mechanization. of con- ^ 


of libraries are identified; 
ventional operations. such’ as: bibliographical’ control. `: 


. processes and ‘administrative monitoring. systems; the : ES 
" automation of search processes, based on ‘subject d 


x L 


fer; and. the, move toward new: -aid “different kinds of 


‘services, that computer technology. 1 may make: possible. ` P 


Me are in.the second phase, and snagged by the difi- 


-.culty^ experienced. by computers: dealing with: naturak”, 


“language and subjéctive ambiguities.. To move: forward , 
‘through phase two will-require a better dialog. gápacity ' 
-between man and machine than . presently, exists. 


"This is an — to identify, in v terms the | 


. point. that has. been. reached i in Marrying computer. tech⸗ 
, nology and library operations, and to sketch the more im- 
portant 'possibilities and problems: that lie ahead. ds the - 
process. continues. Rather. than- exploring ‘technical: ques: - 
` ions. of specific computer potentialities in: specific’ library ` 
> applications, 1t views the problem very broadly, and with | 
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PN ` ' 
i ness. during. conversion to automated modes; the neces: `, S 


Aore progressing into the third phase! a beer. identifica? [M 

tion of the purposes that our files of information rare to; aan 

“serve will be, needed. — Brune WIS I 
Practical considerations ; affecting: computer adoptan: 

by librafies are’ identified as: . the need to stay in busi- ~“ 


sity for demonstrating i in advarice the. economic. advan-, 
tages of: ‘conversion; the difficulty. of proving. in i advance 
he conversion will: meet real’ user needs; and the. 
- standardization dnd compatibility ` problenis that will: . f 
hak to be solved fo make the various. automated libra-. — 
ties able tow use one another’: S. services efficiently.: ME 
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- ment to: provide this capability. A library i is lia & 


emphasis, on the library rather than the computér partner `. — 


in tle-prospectiye marriage. E a 


A: Tecént Systems ‘Déyelopment orici publie. . | 


ion ü ) contains the following. statemént: 


^ Modem information technology: hag — ‘posible. - 
.. toxplace much of the ‘accumulated knowledge of the. 

- ‘human race within reach of a ‘man’s fingertips, Bo to 
` speak: But the. capacity of ‘executives, scientists, and * 


- , Beholars to absorb information has not increased. There- ` 


fore, as the amount: of available information: Le 
there 3 is a parallel need for a more precise capabi 
` retrieve specific data in any area of interest. ^... 


A, considerable percentage of the HM E: — 


- 
` 


edge referred to. passes through or, into libraries.’ Their’ A , 


bu is determitiation of file. systerhs and file manage- ` 
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ferénee in Boston, uou 225 


"Bind on à talk delivered: at the 1906 $ Song Joint, conn oon . 


and indexing journals, that are the search tools needed *' 


A 


'séti of files’ | For:example,. the Library of Congress ‘has ; a ` 
“basic set of files that in all contain some 42 million items. 
. Thirteen million’ are books, «serials, “bound volumes of - 
newspapers, ' „and: the like; 18 millión are manuscripts. <a 
. Then: theré' are 21' million. more items in files of mane, 

< microprints, music, photograpbs, and $0 oh (2). 2 

Means: of accese to these collections are files; The asic at 

"> Library of Congress catalog’ -caid file has -about. seven , a : 
. million titles in it, but this covers.only parts of the.docu- , 
ment collection. "Beyond the basic file are the files that. -> 
` permit access: by title, subject, author, ete. Beyond these ` 
«ers are collections of documents; such as: abstracting |. 


o. ‘approach the other files; and of course, the. files of 
. cards heeded to: ‘get at the — tools; Then ‘there are, 
. What you: might, call. administrative files—records of docu- >. 
ments on loan’ or ordered, or being’ bound, ‘and what not. °° 

¿Of course, any one problem: may, require reference to 
may of. these files.” Furthermore; Any change, such 88: 
Subscribing. to a new journal or just receiving the latest" 
‘igsué of an. old one, is like a rock thrown into a lake—it — 
. sends. a ripple of change across all ‘the surrounding Hes ap 
related e a m Ed | y "X". 
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To move forward in this phase we need improvement 
in the man-machine interface. One way of increasing the 
usefulness of computers in this aspect of information 
retrieval is to develop & computer configuration that lets 
the user play a 20-questions sort of game, a real-time 
dialog that zeroes in on agreement as to just what infor- 
mation in tbe file most closely corresponds to a need of 


| the moment. If the computer learns in the process so as 
' to enter the next dialog with better insight, so much the 


. better. But the contribution of computers to information 
: retrieval is going to remain disappointingly small until 


the dialog requirement 1s met. 
In the third phase of evolution toward mechanized in- 


. formation services, conventional operations may disap- 


- 


. yard recently put it: 
' been posed—bookless libraries." 


pear almost completely, and the storage and search will 
be for information itself without regard for the item or 
document that contains it. As one toller (4) in this vine- 
“The ultimate goal seems to have 


Here we need clearer understanding of the purposes 
that our files of information are to serve. We must 
grasp, for instance, the desirable difference in structure 
and manipulation of files to be used here and now as op- 
posed to archive filles to be used by posterity. Libraries 
and computer experts together are going to have to ex- 
plore the future requirements for filed information and 
the implications of these requirements for computer 
applications. 

One thing is clear. The ultimate system is going to 
require the capability to thread through files of greater 
variety than any now known and from a greater variety 
of angles to attack than has yet been achieved. Every 
file, subfile, and superfile must be interconnected so that 
any and every one of them can be queried at will, with 
any combination of them brought to bear on a given 
problem. To risk an example: answering a single question 
may require fast display of a single page-selected on the 
basis of subject, age, author, availability, restrictions on 
use, language, accuracy, prior and subsequent or related 
information on the same subject, or any combination of 
these. And, the information sought may not even appear 
anywhere in printed form on any page. 

This phase we have not really entered. Some experi- 
mental steps are being taken toward it, however. Project 
MAC at MIT, which involves shared time use from re- 
mote consoles of a pool of information that is heterogen- 
eous from more angles than can easily be listed, is one 
example (8). 

The point we would like to make here, though, is that 
either by foresight or experiment the purposes that future 
files and retrieval systems must serve, and the roles that 
computers can play, will have to be worked out by the 
library and computer communities in close conjunction if 
we are to move through phase three at all. 

Turn now to certain practical considerations that are 
also going to affect both the degree and the rate of com- 
puter adoption by the hbrary community, in all three 
phases. 


First, libraries as they evolve must stay in business as 
they ‘do so. They cannot shut down for retooling for a 
next-generation computerized model. 

Second, mechanization of old capabilities will not 
likely occur unless it guarantees large economic gain 
over conventional ways of doing things. 

Third, new capabilities will not meet with an eager 
reception until someone offers compelling proof that 
they will meet a real user need. 

Fourth, the largest pay-off from automation will only 
be achieved when many kinds of computer-readable 

. records can be freely interchanged among individual 
libraries, and this introduces the ancient stumbling- 
block of standardization. 


It is important not to underestimate the basically con- 
servative attitude that underlies these four points. It 
arises out of a long history of financial starvation of 
library management. It has been intensified in recent 
years by the exponentially increasing amount of informa- 
tion that has to be obtained and fled, and the increasing 
library manpower and space that this entails. Any addi- 
tional subsidizing of libraries that results from recent 
legislation may help to erode library conservatism. How- 
ever, the tendency will be to change cautiously, and to 
use whatever money becomes available for the conven- 
tional purposes that have been so hard to achieve in the 
past with limited cash—buying more books and building 
more space. 

With regard to the first of the four points, not much 
needs to be said. Regardless of what use any one in- 
dividually may make of libraries, they really cannot go 
out of business for any appreciable length of time. No 
matter how logical it may be to put the whole card cata- 
log on magnetic tapes, nobody is going to be willing to get 
along without the cards while the tapes are in the making. 

The second point was the need for economie justifica- 
tion of change in established ways of operating. This is 
not always as simple as it may seem. There is psychologi- 
cal difficulty in scrapping an enormous investment in 
conventional tools and training. There is logical difficulty 
in demonstrating a favorable cost-benefit ratio when the 
benefit rests on some still unknown quantity. No simple 
assertion that computers will do something cheaper is 
likely to persuade librarians. They have a right to be 
suspicious, and they are. 

Thirdly, when it comes to proposing computer ap- 
plications to achieve capabilities beyond the present ones, 
there is another barrier. One of the commonest allega- 
tions in the science information business is that scientists 
and engineers only accept conventional services because 
they haven’t had access to better ones. But, in spite of 
years of effort and hundreds of thousands of dollars spent 
on trying to identify the real needs of users of scientific 
information, we remain unable to describe them to any- 
one’s satisfaction. We are unable even to predict whether 
a new service would find customers even if it could be 
demonstrated that it satisfied a need. 

Finally, with respect to the need for standardization, it 
is clear that part of the necessary economic justification 
referred to above will rest on increased capability to 
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engage in cooperative load-sharing arrangements. How- 
ever, besides challenging the historically derived bias 
toward local self-sufficiency, this will require compatibil- 


ity in a broad area ranging from cataloging rules to 


machine formatting. 

It follows that proposals to set up computer-based 
operations that offer new and appealing services may not 
meet as kind a reception as expected. Furthermore, the 
same argument works in reverse. Proposals that result 
in dropping or curtailing past services find hard going. A 
case in point is the opposition that computer-stored co- 
ordinate indexing, or reduction of files to microstorage, 
meets when someone realizes that the time-honored 
pastime of "browsing" will be threatened. 

These obstacles are not insuperable; but they need to 
be recognized more clearly. Evolution toward better 
libraries with different roles will take place as computer 
applications are made, as the progress through phase one 
and into phase two already has shown. To speed the 
process through phase three simply requires that the com- 
puter community approach realistically the difficulties 
that face the library community and join with it in spell- 


ing out both the services that the systems of the future 


must provide and ways of realizing them. 


I 
Note: Time limitations on the talk on which this 
paper is based precluded extended diseussion of present 
operations and experiments in the introduction of com- 
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puters into libraries. Actually, there is such an enormous 
amount of activity in this field that one seeling repre- 


‘sentative examples is confronted by an embarrassment of 


riches. 'This profusion of examples may be discovered in 


"such bibliographies as the one compiled by Edward M. 


McCormick for the University of Illinois Clinic on 
Library Applications of Data Processing, (6) or the 
several contained in the proceedings of the conference 
on libraries and automation held at Airhe Foundation 


(7). 
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A Documentation Training Model" 


| 
This contribution reports on the design and develop- 


ment of a series of model information retrieval and 
library systems. These are designed to allow documen- 


‘tation students access to a variety of basic files, permit ' 


i HP š š 
lecture demonstrations, enable comparisons (since: all 


M 


+ 
+ 
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! The training model to be discussed is based on an in- 
tegrated set of manual and mechanised indexing systems, 
all handling the same body of information from a limited 

; subject field. By extending the scope of the model's opera- 

| tions to include prior and subsequent activities like the 

selection and abstracting of the documents to be indexed, 
and the preparation and dissemination of material 
through the use of the indexes, the model may be used for 


a wide range of documentation training, principally at : 


three levels: demonstration by the lecturer to the stu- 
dents; use by the students in the retrieval and dissemina- 
tion of information; and development by the students 
through the selection and abstracting of documents, the 
, Indexing and storage of information, and ultimately the 

use of feedback from the dissemination stage to Improve 
' the systems. 

There are principally two reasons for the development 
of such a model. The first concerns access to systems in 
working libraries; the second concerns the usefulness of 
working systems to professional education. Professional 
training has two closely inked aspects; theoretical train- 
ing belongs in the classroom-—practical training in the 
working library. But while this division is fairly satis- 
' factory in areas such as reference work, administration 
and special areas of attention like children's libraries and 
music libraries, training in documentation and indexing 
presents a different situation. 

. Few document indexing systems in special libraries are 
readily available in the vicinity of most library schools 
and even though some may be, it 1s most unlikely to find 
a complete range of all types of system. More con- 
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| * The basis of this article was a paper resd at the FID Congress, 
| Washington, 1905. 


files relate to the same collection), and to serve as the 
depository for the input assignments of advanced 
students. 

Comments are provided as to the choice of systems 
made, and their potential applications for research. 


C. D. BATTY, B.A., F.L.A. 


Head of Department of Information Retrieval Studies, 
College of Librarianship, Wales. 


clusively, libraries of this type are highly conscious of 
efficient performance; they will therefore be the more 
reluctant to allow students ready access to thelr systems, 
and will hardly ever allow whole classes of students to 
operate and experiment with them. In any case the in- 
structional value of existing documentation systems in 
libraries js lessened by the wide range of subjects covered 
collectively by all the systems encountered. The in- 
evitable unfamiliarity of many of the subjects will limit 
reliable assessment of a system’s efficiency and the variety 
of subjects will effectively inhibit that comparative view 
which should provide the best instruction of all. 

It is possible to go further and say that in general a 
complete working system is unnecessary for instructional 
purposes. There is a minimum size, it is true—but once 


. this has been achieved, further development is a dis- 


pensable luxury except where the operations involved 
constitute the instruction in hand. Rather than use exist- 
ing documentation systems, then, it may actually be 
better to construct a set of representative systems within 
the library school. This would ensure: first, a full range 
of possible systems; second, constant, immediate, and 
unrestricted access; and third, better facilities for com- 
parative study, since all systems could handle the same 
material. Certain economies of time and effort would also 
be possible where systems shared operations ‘or equip- 
ment, for instance, in the selection or abstracting of docu- 
ments, in devising languages for similar indexing systems, 
or in the use of equipment handling information recorded 
in a similar physical form. 

The reason why such a model might seem in any way 
novel is partly historical and partly financial. In the 
United Kingdom a concentration in the past on public 
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— needs and standards, which even now affects tha’ P ' The modei ig being bulo up ‘tines — du NS — 


' teaching .of more specialized disciplines and techniques, n The first stage comprises manua] - ‘systems. - Two. di^ 
` and the inclusion of library schools im. colleges of com-.- ' these, à «conventional classified - ‘catalogue and. a rotated `. 
merce and technology (whith belong to municipal authori- 'elassified file in visible: index form, ‘use a new scheme of, 
. ties) rather than i in Universities has meant not only that ` > classification: for library sciente devised by the Classi. 

_ expensive equipment is low. on. any list of priorities but, ` cation Research Groupin London. 

more importànt, that little staff time is available for the: .. The same scheme is, used. for the College brary 2 
development.of new iraining methods...” ., ^. | 7. is dts ‘first, practical application- (beginning actually: be-:. 


Only recently’ have a. few library 5: broken away . -> fore its publication) beginning 3 in March 1966. After, some. 
“from this pattern, notably- the post-gradunte school ih -' ^15 months of experienée, the College. began to circulate; 
` Sheffield University, the library. schools -in universities. ' CRG/EXTRA (EX Tensions; Revisions, and. Alterations): 
^in Scotland, and Northern . Ireland, and: the College of". ' to outline problems encountered. and solutions proposed.: 


"Librarianship ‘ jn Wales, an independent college concemnéd ' Hd 18 hoped i in this way to secure comment and: discussion... ' 
= ‘only with library science. ^. . | — a ‘on usé to, assist the: Classification Research Group. in: 

p ‘The model: under discussion has grown in concept and M . the’ work on: a second edition. TE is a faceted scheme; 
. execution from 4. ‘practice common enough in British . covering. all types of library and - library, service, and” 


— schools: that of constructing miniature catalogues. ‘the materials and: techniques of librarianship, as well 88. 
"from: the entries composed in practical cataloguing ses-. . questions .affecting the proféssion and. a — 
sions: These, without. much’ trouble, can be made to form - C ‘variety of fringe, topics; from education to the book trade. 
„parallel dictiónary and classified catalogues, not only to. All Tatets' (or. classes) 'aré! listed in. the achedule i m gen- 
give the students. “practice in their construction and | Ñ eral to ‘specific order, and combine retroactively, to give: 
"manipulation, but also to offer a comparison of types. An. the. citation. order for elements in compound class num-. 
. easy translation into the field of documentation and index- ., bers: libraries— sérvices-—materials-— processes— general. 
Ang was made when the new Library Association examina- ,^ ^ Questions. ‘The classifier must be able not only. to analyze, 
' tion syllabus offered such papers as the Handling and. Dis- abstracts- into. component terms (as for coordinate ' in 

` gemination.of Information, Special Librarianship, thé , dexing) | but also to recognize an ‘order’ of significance. " ; 


. 'Théory of Classification, "und (in the post-graduate. syl- _,. Once this hag been done. the application: of: be. stheme:; 
` labus) Indexing, Abstracting, and ‘Information Retrieval: ^ 1 ‘ig simple. : 

D the same time the’ new syllabus encouraged attention “> ` "The ‘conventional — is on’ 195 x. 76mm mes 
in this field i in "broader, though still. relevant DADES He ‘and the entries show the class "number by which’ they ! 
_ The Organisation of Knowledge” . ; .. are filed, author, title and source-of the article, as well: 


“In a paper to the 1964 annual conference of: ASLIB the | ` : as the — number. There. is an ' alphabetical subs: 
“author described work involved in the construction ofta- ..¢ ject index te the main classified file constructed according" 
simple machine: sorted punched ' card index to. the’ peris -... to chain procedure, where each element’ «(each link’ in i 

. odical literature: of A limited field. and considered: some : ..; the hierarchical . chain) is. indexed with ‘its appropriate , 
problems of its demoh&tration and use in the teaching. of F “ súperordináte terms used “as descriptors, , to: provide: ' 

| | mechanized information retrieval. The. establishment . of 1 ' relevant: access for inquiries: initiated at; too: general ; 8 `: 
> the College of Librarianship.1 in Wales offered occasion for `: , level.’ See Fig. 1. “About five hundred: abstracts Have 

: the. rationalization of work and ideas arising’ from. this“ _ , 80 far been classified. with. an ayerage of — Bye”, 


project. Temporarily within the limits of the Library 2: `” “elements” ineach class number. adc Mat 

et Association examination syllabus, but encouraged to look ' The nature of the classification: —— has. EE 

| ahead. to more advanced courses, discussion began on the ° . - dio. inclusión if the model of a variation’ of the rotated- : 
| provision of a whole range.of. ‘depth indexing methods, - —- classified file: Instead of. rotating the elements in: the: class ^ 
from a conventional classified card index to computer ` number, ‘however, . (so that ABCD' becomes in tuin, BCDA, "s 


| Systems, -and on the possibility of extending . this effort. -` ` ¿CDAB, and DABC on four different: entries). this index adopte. | 
into the wider. field of dotumentation by- selecting and... what actually happens ` in many: so-called rotated in- , 


ve preparing material, -and: disseminating it, al. a8 part of | dexes prodücéd. on “computers, where the. entire line’ 18 ' 
. the same sequence ( of. operations. à *. > shifted along to the left, 'one. element ‘at a time: This; 
. The material: üsed as a: basis for the 'model sie ‘itera.’ P ` provides multiple entry as before, since “each element. i in 


z nj ture on information retrieval, because of the familiarity‘of ` `- turn appears at a marked filing position, ‘but ‘it doés not; 
E ilie. concepts and- terminology, the limited (sizeof the: s disturb their 'order, thus préserving the classified. heading . 
feld, and therefore of the index languages, and the exist-; . 88 & coded analysis of the whole subject of the document . 
| a ~ exce ót Library Science Abstracts as a predigestéd form .. - zane an sddress'in-the classified shelf order. - \ ' 


- from’ which tbe studente may work and to which easy . ^. This index: "Uses Kalamazoo " Cópystrip" bindere, — 
‘reférerice may. "bé made. And since, Library Sctence “Abe, E containing twenty-five dividers’ with thirty 2-line: entries ` 
. Strácte: did: not begin until * 1950" there remains. &'Gonsid- ^ per page, and- each: entry ‘showing tlie class number, 
erable’ body : of. material : : requiring: selection and ab- ^ . author, and. — title, and. the abstract number. See: 
dad I. UM rx LEES: 2. NL "a p 
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Proposed: a processing center for public libraries 
in Southern California, by John and Dorcas Connor. 
Calif.Lib., 1L(3) March 1225; 155-157 and 182. 


LSA 2795 


Fia. la. Main entry for the conventional classified catalogue 
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| | f : Processing: public libraries Vs G 
Administration: prócessing: public libraries Vs G E 


Cooperation: processing: public libraries Vs G Ey 


California: cooperation: processing?’ 
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Fic, 1b. Alphabetical subject index entries by chain procedure for the main entry shown in Fig. la 


732 — ‘A low cost cataloging system [Indiens] 
2795 CONNOR(J&D) Processing center for [PLs] in S.Californià 
11269 BRUNO(EW) The California union catalog. 


732 GAUNT(R) A low cost alone system [Iridiana 


2795 CONNOR(J&D) Processing cénter for PLs] in S. California 
11269 BRUNO(EW) The California union catalog 








13019 MACQUARRIE(C).Cost survey [of processing] in S.California 
2795 CONNOR(J&D) Processing center for [PLs] in S.California 
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: "Most of ‘the work on: the” punched - card: -indéxes has 


À tems: 
'.eidence. At first 16. was, intended. not to inelude in ihe. 


_ model more than one ‘system of. any type (in this case. 


The — mius lema are all doriana indexes, 
based on the. ‘same ‘index language, though not using 
it in identical ways. : Two of them are. term: entry sys ' 
© respectively. number’ matching and optical com-_ 


manual systemis based on terni entry): but to constrict _ 
small static models simply to demonstrate the. different”. 


. devices through which" the same ‘basic: system might 
- be manifested. An. optical. coincidericó. system was oB-.. 
viously better for the model because ‘of its ease of. 


~ 


operation, with a number matching system. as its static 
counterpart,’ but- since in practice- it ‘proved ‘easier ` to" 
construct the number matching system first and: transfer: 
it en: bloc to optical. coincidence feature cards jt Was -. 
decided to; keep: them. -both.. The" number matching" 


system 'uses:standard. 200 x 125 mm index cards filed in ° 
alphabetical order of the index. words; the optical coin-, 


cidence ‘system “uses. "Manisort.: 'equipment, with cards of” 


10,000 ‘document capacity. It has béen obsérved that 


abstracts require an average. of eight terms: for adequate y 


.description.. ~~ - ^ EE 


The third manual —— idera is an — entry 
system using .Cope-Chat standard -P.1.. 75-hole cards, 


6x4in. For this system. the thesaurus is given. a random = Y 
number coding of pairs of two digit numbers; since ihe ` 


fa 


whole card is regarded as'a single field the total notation * 


available iig; 156—2862.5' places—aniple- for coding. the: . 
estimated 500- or 600-word thesaurus. "The. cards’ ‘show 
only the abstract . number and -the . serial numbers of :. 


the :indeX words; together” with the’. random ` number » 


' coding, ` It" is physically possible to include the whole’ 


abstract on a card of-this kind, but in the model it- 
was ‘felt that: to ‘duplicate the Taster file of abstracts. 
and to «make. one index independent ` ‘would-be to the 


disadvantage of the model asa whole. See Fig.3. —. | 


Stage two in the development of the" model concerns. 
méchanized systems using 80-column ` punched cards and. 
ICT equipment, comprising: an alphanumeric keypunch; | 


a vérifier, an ICT 302 sorter, an interpreter; a. mark: ` 


sensing and. teproducing punch and a tabulator-printer. 


P, , 
a y P - 
* + 


+ 





d 


best. dne on ms mark sensing putieh — that. — 
dents could prepare the cards themselves) ,- and the 
. sorter.- The tabulator-pririter: lists abstract numbers’ of. 
cards retrieved in response to a partieulàr search, . AD 
these systems are item. entry systems. ^. ° ` E M 

The first,i is the simplest, of all; using ‘positional puncli- 


"dog. for: serially ` :nüfnbered index. terms. These : ‘are. 


punched : into & single field. of. seventy columns. A 
: second field: comprising the last. 10 „columns is. reserved 
for. the abstract number: ‘at. ‘present this is the Library 
Science Abstract. number of up to five digits, punched : 
into. the last five columns only, but’ if, as is intended, , 


‘an independent file “of “abstracts. of. pre-1980, material - 


18 built up, then distinguishing ‘prefixes may be sed. 
ad the earlier columns of that field. See Fig. 4. 
The second punched card systemi is simply a — 
anized form of. the. edge notched card index mentioned’: 
above,: with the; difference. that the' pairs of two: digit. 
numbers used. in the: rahdom number; coding are divided 
_ between two fields of: 100 ‘positions each (the first dwo 
digit number -of ' each: pair in the ‘first field and: the 
- second -hümber in 'the"second field). "This ‘increases’ ‘thé. 


| notational capacity to 10,000 codings. and, by a: wider 


scattering - of index, terins, lowers. the: proportion “of, 
false drops. ‘Cols: 4-10 and- ll-20-àre used | for; the . | 
' subject: coding; cols. 21-70 are reserved for author/title 
and/or source, ‘though: this. field :has not yet been’ used. 
. Cols. 7. 1-80 are resérved: fór the abstract ud AB: de, 
scribed above: See. Fig. 5. de M — 

"The third is — involving mel of 
‘the: indéx language, ‘and a, consequent categotization ‘of ' 
the: coding for the ‘punched card. For obvious: reasons 
. "this is not .yet a «part. of the model: and may, never. be. 
^Tt is-a useful adjunct that feeds. the model "with ideas. 
One version designed for 40-column , cards. and equip*-. 
ment ; ‘categorizes ‘the index language ‘into three ‘main , 
fields of" library; «material, “and „process, and a fourth 
. field. of the names. of people and places. A ‘mixture’ of 
direct, and. ‘condensed coding ‘and the combination . of. 
; Zone and digital punchings has substaiitially increased; the ^ 
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capacity or the. smaller card, ` . ZEE z 
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Fia. 3.. Edge-punched card using; random coding: of, pairs of two digit numbers)’. , . Yt f+ 
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Fig. 4. Item entry system on 80-column cards using direct coding and positional punching 


It might be relevant to say something here about the 
coprdinate index thesaurus. This is basically the same 
for all systems. It was worked out originally for the 
first of the mechanized systems just mentioned—the 
oni using serial positional punching—by the empirical 
method of indexing as many abstracts as possible and 
collating the terms used. An early assumption that 
a thesaurus for information retrieval and documentation 
would barely exceed 500 terms (and that therefore 
positional punching could be considered) has been 
borne out in practice, though: there are certainly more 
terms to be added. The thesaurus is maintained on 
punched cards for control and easy production. 

Stage three concerns additional or substitute systems 
considered or planned for a computer now under con- 
sideration for the ‘College, to be used on a time-sharing 
basis with other institutions in the area. These com- 
puter systems include a classified file based on the con- 
ventional manual classified file, to be developed as a 


classified catalogue programme, a serial file (item entry) 
for sequential scanning, and an inverted file (term 
entry) searched through a random access device. In 
addition to these, which develop naturally out of sys- 
tems in the first and second stages of development, 
it is intended to add a KWIC index and a lattice index. 
The latter is a type comparatively little studied as yet 
and should provide an interesting field for research. 

In addition to the indexes already discussed there is 
a master fle of 200-125 mm index cards containing 
all class numbers, index entries, and codings. This is filed 
in abstract number order. See Fig. 6. | 

The model was “primed” with 1,000 documents coded 
in each system. This work was done by the staff in 
order to test and develop the thesaurus and the classi- 
fication scheme, to establish forms and routines for stu- 
dent work, and to build the model to a size sufficient 
for initial demonstration by the lecturer and manipula-. 
tion by the student.. From this point work by the 


2795 | 
CONNOR, . John and Dorcas 
Proposed: a processing center for public 

| libraries in Southern California. 
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Calif.Lib., 14(3) March 1953, 155-157 and 182 
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Fic. 5. Item entry system on S0-column cards using random coding in two fields 
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students: begañ T develop ind extend the ipod con- l ..'Becond year — on relevant courses use e thé “model... 
- trolled and: occasionally: súpplemented by, the staff. Staff." agan object lesson. on which | to" base. Seminar discussion - 


Bisat! * 


- participation: i is now concerned to «keep all paris of. the: "and to gain experience ¿in its manipulation, setting UD | 
J model at the same stage of. development. — en ' programmes for demand searchers, and the general bro- ` 
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* It has been. the author's present concern to. ‘deicribe | duction of bibliographies. : | — DETT es 


s: the training model itself; it is hoped at not too distant . | UC : 
. a date to offer some account of its use and value. It’. ` Classes specializing in documentation,: such. as in: We 


2 ..-Inight. be' useful : at. this. stage, however, to indicate jn ` “handling and dissemination of information, proceed. from 


7 : general terms the uses envisaged. for the, model and: Which, manipulation to develópment of the model., At this; Stagg 
- "Studént8 begin, by indexing: Or classifying: ‘abstracts : dor- 


p | it already serves. p. 5 xe di A = 
^ ,« Elementary ‘classes in the. ‘organization. of knowledge 3: the model, coding them- for each system in tum, “and, go, | 


| . receive demonstration ‘by the lecturer of the indexing ` on to the preparation. of abstracts for the indexer. ++”. 
j systems, and. the pattern of documentation - Activity. that =: d is hoped ata ‘later. stage to - use, the: model for 
. | - includes | them. ~ At this level little practical work. is ` — training in SDI- systems, but this, [ie so 
E possible | or. even: desirable,. ‘since ‘the classes are large, de m much that may be. "expected from: such a. model, is- sti 
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Ar Experimental System for Automatic Identification 


i 


! 
| n. y s a 
Natural language seems to contain various special- 


purpose subsystems, e.g., personal titles, personal 


ndmes, dates, street addresses, place names—each- 


with its own structure which relative to the total struc- 
ture of language is rather simple. 

An ability to identify automatically words and word 
strings belonging to various special-purpose linguistic 
subsystems (akin to some thesaurus classes) may prove 
to be very useful since they play an important role in 
the making of indexes and in various systems for 
extracting and distributing information. 

This article describes some of the main problems in- 
volved in automatic identification in newspaper texts of 
words and word strings belonging to two important 
linguistic subsystems, viz., personal titles and names; 
lists some of the major rules of an algorithm designed 
to perform this task; presents statistics concerning the 
algorithm's accuracy and exhaustiveness obtained in 
manual application of the algorithm to texts; and sug- 
gests some applications for computer programs ca- 
pable of recognizing personal titles and names. 


x 
| 


| 
* Motivation for the Experiment 


One of the major questions of the day is the extent. 
to which a computer can be instructed to identify 


various parts of texts written in plain, ordinary language. 
In trying to answer this question, we set ourselves the 
preliminary limited objective of developing an auto- 
matic; procedure for identifying proper names in En- 
glish-language texts and classifying them according to 
type, for example, as names of persons, names of or- 
ganizations, names of places, etc. 


The selection of this objective was based on the 


following considerations: 


1: Proper names are easier to identify and classify 
automatically than other parts of texts (a) because 


Warme — 


of Personal Names and Personal Titles in Newspaper Texts 


The results obtained indicate that an automatic sys- 
tem capable of accurate and exhaustive identification 
of personal titles and names in texts requires recogni- 
tion procedures which are rather complex. 

lt is therefore suggested that along with researching 
and developing methods for high-quality automatic 
classification of words in texts, it may be advisable to 
set up efficient procedures for manual classification and 
tagging of words in texts, and automatic extraction of 
data from texts which were recognized either manually 
or automatically. 

Such action seems appropriate since automatic 
extraction of information from manually recognized 
texts would probably constitute a valuable service, and, 
when automatic procedures for identifying dates, per- 
sonal names, personal titles, trade names, company 
names, chemical formulas, numbers and measure 
words, and so forth become competitive with manual 
ones, the data-processing profession will be already in 
possession of operational computer programs capable 
of extracting data from recognized texts. 


CASIMIR BORKOWSKI 


Thomas J. Watson IBM Research Center 
Yorktown Heights, N.Y. 


of orthographic and style rules (capitalization, etc.) 
and (b) because relative to the rest of the language 
their structure is usually quite simple. 

2. Identification of proper names can be carried out 
largely independently of the identification of other 

- parts of texts. 

3. Attempts at automatic identifications may in- 
crease our knowledge of the structure of language. 

4, An ability to identify — names automatically 
m prove to be very useful since proper names play 

— role in the making of various indexes 

well as in various other systems for extracting 
and distributi information. Automation of name 
identification might therefore present a certain eco- 
nomic advantage. 


Since automatic identification of names of persons ap- 
peared easier than that of proper names of other types, 
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we decided to commence our investigation of proper 
names with an investigation of personal names. 

In newspaper texts, personal names are often pre- 
ceded by personal titles, that 3s, words or phrases be- 
stowed on individuals as & mark of distinction, rank, or 
dignity and peavey describing or implying office or 
vocation. 

Because personal titles provide important information 
about the persons bearing these titles and furthermore 
because automatic identification of prenomial titles helps 
in automatic identification of personal names, we com- 
bined both automatic identification procedures into a 
single system. 

A newspaper was selected as the source of texts 
because many newspapers are printed by means of 
typesetting tapes which could be converted to computer- 
legible form; because automating the extraction of 
information from newspaper morgues (files of' reference 
" material) presents a challenging problem; and because 
newspapers contain a great variety of personal names 
and titles. 


* Some Problems of Automatic Name Identifica- 
tion 


Automatic identification of names of persons in texts 


is of course not without its difficulties. First of all, many 


personal names are homographic, that is orthographically 
identical, with other types of words in the language. 
This is the ‘case since among the main sources of sur- 
names are titles, e.g., Baron, King; names of occupations, 
e.g., Baker, Smith; topographic terms, e.g., Bridges, Dale; 
names of animals, eg., Bull, Fox; names of places, eg., 
Danzig, London; names of plants, trees, etc., e.g., Bean, 
Bush; personal attributes, e.g., Stern, Wise, and so forth. 

There is considerable homography between personal 
names and place names due to the fact that not only 
are the names of places. a frequent source of personal 
names,.but because many localities were named after 
people, as, for example, Berkeley, California, and St. 
Augustine, Florida. And to make matters worse, hotels 


and business firms can be named after people and are : 


often referred to by an abbreviated name which is that 
of a person, e.g., "I am staying at the Mark Hopkins,” 
"Ford was hit by & strike last week." As for personal 
names like Elizabeth Arden and Maz Factor, they desig- 
nate persons as well as business firms, while Philip Morris 
is the name of a person, of a corporation, and a brand of 
cigarettes. 

Yet another difficulty arises in the case of names of 
persons, e.g, Madison, Sir Francis Drake, when they 
perform a naming function with regard to something, 
say an avenue, eg. Madison Avenue, or a hotel, e.g., 
Hotel Sir Francis Drake. Presumably, it would be 
worthwhile to distinguish automatically references to 
persons from references to things named after persons. 

Still another difficulty in recognizing personal names 
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results from the fact that personal titles are not unfail- 
ing aids in identifying personal names because titles 
themselves can be homographic with other types of 
words. For instance, General is a military rank in 
General Mobutu but not in General Motors. 

Further difficulties result from the fact that some 
titles are homographic with given names. It is not & 
simple matter to specify the rules that would enable 
an automaton to decide when Dean is a personal name 
and when it is a personal title, e.g, Dean Husk, Dean 
Wiesner. 


* State of the Art 


The name of the science dealing with the origins, 
forms, and usage of proper names is onomastics. Al- 
though onomastics has important ‘contributions to make 
to many problems connected with computer identification 
of names in texts, at present there seems to be little 
interest among name specialists in computation problems 
such as automatic recognition of names. 

An interesting and—as far as we know—the only 
previous experiment in computer recognition of names 
which has been described in the open literature was per- 
formed by a documentation specialist, Professor Susan 
Artandi at Rutgers School of Library Service (1) (2). 
Professor Artandi used two relatively simple methods 
of identifying proper names in a pioneering experiment 
whose goal was to determine the extent to which a 
computer could assist human editors in indexing docu- 
ments with whose subjects the editors have only a 
minimum amount of familiarity. | 

The first method extracts from texts capitalized words 
and strings of capitalized words while the second method 
records “all capitalized words which appear in the docu- 
ment text with the four words preceding it, the four 
words following it, and its page number." The propor- 
tion of “useful indexing entries” which was produced by 
these methods was about 50%. 


* Experimental Design 


Our procedure in setting up an automatic method for 
identifying personal names was approximately as follows: 


1. We investigated permissible patterns of personal 
titles and of English, Spanish, Russian, Chinese, and 
other personal names whose occurrence in texts we 
could anticipate. 

2. We searched the literature for information con- 
cerning the structures of personal names and titles, 
methods of processing texte, and other data pertinent 
to automatic identification of personal names and titles 
in texts. 

3. We obtained a 60,000-word — of newspaper 
texts and determined patterns of occurrence of personal 
names and titles in texts; patterns of personal names 
and titles occurring in texts ; &nd problems involved 


in distinguishing personal names and titles from each : 


other and from other parts of texts. 
4. Based on investigations 1, 2, and 3, we set up an 
automatic 'procedure designed to identify personal 


names and titles in newspaper texts. This procedure 


was embodied in flow charts and a dictionary. 
5. We tested our procedure manually on a 100,000- 
word sample of new newspaper texts, and we amended 


the rules and expanded the dictionary on the basis of ' 


the information provided by the tests. 

6. We then stabilized the improved procedure, 
tested it out manually on a new 40,000-word sample of 
new newspaper texts, and collected statistics concern- 
ing its accuracy and exhaustiveness. (Our reasons for 
applying the algorithm manually were as follows: 
Our identification system was ambodied in dictionary 
entries and flow charts which were sufficiently detailed 
to permit accurate execution of recognition procedures 
and we thought that it would not pay to code and 
debug over a period of months what would probably 
turn out to be a “one-shot” program.) 

7. We then investigated what types of errors had 
occurred and proposed various amendments to the 
automatic recognition procedure. 

S. We speculated about possible applications of a 
computer program capable of recognizing names and 
titles of persons In newspaper texts. 


We selected recognition rules, or rather recognition 
hypotheses, which are basically quite simple with the in- 
tention of finding out how many correct identifications 
and how many errors they produce. 

We plan to amend these hypotheses in the light of the 
results obtained. Refinements, additions, reformulations 
of the rules, as well as changes in basic methodology, will 
be brought in as required and the trade-off between the 
complexity and the effectiveness of the rules will be noted. 

In other words, the present set of identification rules 
is a first approximation, a first scratch of the surface. We 


think, however, that the discovery of the area in which: 


the rules fail will be helpful in suggesting new directions 
and methodology for research on automatic identification 
of parts of texts written in ordinary. language. 

The basic assumptions made by the rules which we are 
about to describe may seem, quite unsophisticated. They 
concern the meanings of words, of phrases, of affixes, of 
punctuation marks, of capitalizations, ete, which are 
encountered in newspaper texts. 


* The Goals 


The goals of our initial system for automatic identifica- 
tion were quite modest. The solution of many difficult or 
complex identification problems was not attempted. For 
instance, we did not attempt to differentiate the names of 
persons from the names of various objects named after 
persons, e.g., a Garand; the names of fictional and sym- 
bolic characters, e.g., Simon Legree; the names of horses 


and other animals, e.g., Uncle Maz, Vicar Hanover, and < 


so forth. 
Likewise, we did not attempt to specify the rules for 
separating into different strings adjacent names, eg., 


Wintfred Beethoven, which will occasionally occur in 
sentences with double-object verbs, e.g., gave in “John 
gave Winifred Beethoven's Rasoumovsky numbers one 
and two”; and in sentences without a comma after an 
adjunct phrase, e.g., “After his encounter with Thomas 
Hood had to retreat,” and so forth. Such an attempt will 
be made later if a significantly high number of contiguous 
names is found to occur in texts. 

The solution of some problems of automatic name and 
title identification may require stronger theoretical as- 
sumptions about sentence and text structure and more 
elaborate techniques of sentence and text analysis.- For 
instance, since automatic parsing of sentences may be 
helpful in identifying sequences of names each of which 
is followed by its title, e.g., “The President nominated 
John Gordon Ambassador to Guatemala, William T. M. 
Beale, Jr., Ambassador to Jamaica .. .,” future auto- 
matic systems for identifying personal titles and names 
may parse sentences containmg double-object verbs and 
strings consisting of personal names followed by titles. 
However, since parsing and other types of analyses may 
be expensive, it would seem advisable to apply them only 
when they can reasonably be expected to provide eco- 
nomic solutions to valid problems. 


* Some Identification Rules 


Our rules describe the arrangement in the sentence 
of the words, phrases, and punctuation marks which are 
pertinent to the identification of names and prenomial 


_ titles. Generally, the description starts with the first, 


that is, the leftmost pertinent element of a sentence and 
terminates with the last, or rightmost pertinent element. 

For greater ease of reading, the rules are expressed here 
in narrative form. For the sake of brevity, only some of 
their more important features are listed here. A more 
complete description of identification rules is available 
elsewhere (3). 

Our rules for recognizing names of persons take advan- 
tage of the style rules of The New York Times. We would 
conjecture that whereas details of name recognition rules 
may vary from newspaper to newspaper, their general 
pattern will remain fairly stable and independent of 
editorial conventions. 

The rule for identifying personal titles which was se- 
lected as a reasonable first approximation states that a 
word or phrase in text is a personal title either: 


1. If it matches a word or string of words on a. list 

of titles 
or 

2..If it matches a word or a string of words which is 
on a list of words and phrases which commonly com- 
bine with titles, eg., Acting, Assistant, Vice, and is 
followed by a personal title, e.g., Acting Mayor, Acting 
Assistant Vice President 


or 
3. If it is a personal title followed by a word or a 
string of words which is on a list of words which com- 
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states that isolated occurrences of words which are both 
names of places and names of persons, e.g. Alberta, 
Berkeley, Charlotte, Georgia, Selma, Washington are 
place names. 

'We also assume that full names not preceded by titles 
arid composed of ambiguous elements, e.g., Selma Young, 
and ambiguous names, e.g, Baker, Beard, Charlotte, 
Young, which occasionally oecur in & newspaper article 


not preceded by titles or initials and not adjacent to an - 


unambiguous name, also occur elsewhere in the same 
article either preceded by personal titles, e.g., Mr. Baker, 
Professor Beard, Miss Selma Young, Superintendent 
Young, or adjacent to unambiguous names, eg., Robert 
Baker, Charles Beard, Charlotte Corday, Susan Young. 
Ambiguous names preceded by titles or adjacent to un- 
ambiguous names are identifiable as names of persons. 

"We assume further that if & word in a newspaper 
article, e.g., Baker, Charlotte, Ford, Young, Washington, 
has been recognized as a name of a person, then its other 
occurrences in that article—including its isolated occur- 
rences—are also names of persons. 

As a rule, if a word or a phrase in a newspaper article 
has been identified as a name of a person performing a 
special naming function with regard to the designation of 
some other word or phrase, eg, Everest in Mount 
Everest, J. P. Dumont in J. P. Dumont and Company, 
Lenin in icebreaker Lenin, then all its isolated occur- 
rences, e.g., Everest, J. P. Dumont, Lenin, are not names 
of persons but namesakes, However, if a word or a 
phrase in a newspaper article is identified as belonging to 
two or more of the following types of constructions: 
(1) the name of a person, Mr. J. P. Dumont, Mr. Wash- 
ington, (2) & namesake, J. P. Dumont and Company, 
Washington Brothers, (3) & place name, Dumont, N. J., 
Washington, D. C., and so forth, then the ambiguity of 
its isolated occurrences is not resolvable by our present 
téchniques. 


Our present rule for identifying namesakes and capi- 


talized words which are not names states that most 
capitalized words at the beginning of sentences which 
adjoin personal names without being part of them, e.g. 
Suddenly in Suddenly Robert Green . . ;, Encountering 
in Encountering June Boas . . . can bs listed or com- 
puted and are therefore identifiable. Likewise, we assume 
that most eapitalized words inside sentences which ad- 
Join personal names without being part of them, eg. 
Monday as in . . . on Monday Robert Green . . ., are 
also listable and therefore identifiable. 

We also assume that most capitalized words which 
adjoin personal names without being part of them and 
which designate namesakes, e.g., Mount in Mount Ken- 
nedy, Airport in La Guardia Airport, are identifiable 
either by "list lookup" or by means of recognition rules. 


£ 


For instance, 1f the namesake is a geographical term, ` 


e.g., Mount, Street, Lane, Plaza, then that term and the 
adjoining name of person are identifiable as a geographi- 


cal phrase in which the name of person acts as a proper 


name with regard to the designatum of the geographical 


i 
| 
i 


term, eg., Mount Kennedy, Kennedy Plaza. The most 
frequently occurring exceptions to this rule can be listed, 
e.g, Dame Lane (the preferred interpretation of Dame 
Lane is “a Dame named Lane,” rather than “a Lane 
named Dame," Gallo Plaza (the name of the former 
U.N. mediator to Cyprus). 

If, however, the namesake designates a type of com- 
mercial establishment, e.g., Hotel, Lodge, Radio Repair 
Shop, then that term and the adjoining name’ of person 
are identifiable as a commercial establishment phrase in 
which the name of person acts as a proper name with 
regard to the designatum of the namesake, e.g. Hotel 
Roosevelt, Wilson Radio Repair Shop. The most fre- 
quently occurring exception to this rule can be listed, e.g., 
Ambassador Lodge. 


* Results of the Experiment 


. Since our identification rules were embodied in diction- 
ary entries and flowcharts which were sufficiently de- 
tailed to permit an accurate manual execution of identifi- 
cation procedures, it was decided that our identification 
system would be tested out by hand on a sample of The 
New York Times texts 

Identification procedures were applied manually to 
some 40,000 words of texts. Altogether 88 articles from 
11 issues were selected and processed (3, pp. 26-45). Only 
news articles were included in the sample. All materials 
found in the special sections such as entertainment, food- 
fashions-family-furnishings, social events, necrology, ete., . 
were omitted. Materials in the sample consisted of only 
texts of news articles; picture captions, advertisements, 
italicized lists of various sorts, charts and diagrams, etc., 
were excluded from the data. 

Our 40,577-word sample contained 806 occurrences of 
names of persons. Of the 806 occurrences of names of 
persons, 46 or about 6% of the total were missed. In 
addition, 47 words and word strings were mistakenly 
identified as personal names or personal titles. 

Figure of merit F for the results of this identification 
system was computed by means of the following formula: 


C2 

—(C-M)xT 
where C is the number of correct identifications, M is the 
number of mistaken identifications, and T'is the number 


of names of persons in the sample (4). 
For 7=806, C=746, and M=47 


P= 7462 
= (746--47) x 806 
(Because of our scoring rules [3, pp. 42-45] the number 


of identifications and misses does not add up to the num- 
ber of names of persons in the sample.) 


= 87 


2 Flow charts will be available at a depository library after Septem- 
ber 1967. 
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In addition, we need to ‘prevent or: eliminate thé errors o ^ 2.- Determining how the names: of: persons co- occur. 
bci ton dut ali words ots) © with one another and with other. words. An automatic. 
caused by-the assumption capi C7'  * , system for ‘recognizing names could be used to produce : 
curring after ambiguous ‘words such as- General, Justice; co Cadist'of headlines of the’ newspaper. articles ‘in which 
Major, Principal, ete., árenames of persons. - . v certain personal hames- co-oċcur with.certain other per-. 
We: also’ require more effective rules‘ to —— sonalmames, or with certain proper nouns other than. 


. personal names, or with certain common words, classes ^. 
, strings of "titles, eg., President, Secretary of. State, from- . of common words, ete: It would bë possible to list all. 


^ titles followed by names... In addition; we need more - ef. + guch articles Bv date, column, ‘page; ete., and thus to. 
-fective rules for distributing a title among all names of ° construct another. type” of index to the: Horepa 
: persons which follow it in the text, e.g., Senators Vance .  . morgu. . . p | 
~" Hartke and: Birch. Bayh of Indiana and Eugene J. Mess "s Counting acer — af names. 3 would be. pos P 
"e m ; O n o - 
a — and Walter F. 2 fondale of Mumesata.. i... Ë "ferent linguistic patterns: Irish, ‘Chinese, and so forth, | 
Improving the automatic identification system may asc. and to study théir patterns ‘of occurrence.” A different. 
gue. several subsidiary investigations... For ` ‘instance, .. count could be performed for, each different section of | 
'^. we.may be-well advised to determine the relationship, _the newspaper: society, business and’ industry, sports, 
vut any—between, onthe one hand, the effectiveness of-the : , mE entertainment, and-the rest, Or a- program for šuto- 
. ` matic identification of titles and names could bé used:to- 
“system and; on thé other, the length, the date, the place  " produce referenices:to articles in which certain personal : 
"- of origin, the subject matter, the authorship, and the type. names or titles'oceur with certain. frequency. - T EC aM 
of newspaper articles processed through the:system: o 0200 4. Tracing. associations between: names of "persons. A 
tee Ge AD as a K pa ea a: Oe -procedure for tracing. associations between, names . 
3 RD a ee qM - IR A 7 + «* of persons. may, consist of: the following series of. steps: © 
"c F 3 DIS a Mae. sa" Ur ` (a) recognizing names of persons in newspaper articles 
. . Some Pie Applications ENT ka So XS (b) séarching other articles Y: for occurrences;of 


Ü 


« xm E PI P 25, E I. personal names which had occurred in X; (c). recog--' 
"e Au " ',' nizing all names of persons in articles Y”; Y", Y" etè? > 
M hi the — of ae. ‘on tho. boẽt. ‘of. identifying: ~. . M they contain any. of the personal. names found in — 
. ' names of ‘persons. by computer, the subject of the applic =. . (d) searching other articles Z- for occurrences of ` De 


-. sonal names. which- had occurred in Y’, Y", Y^, ete; 
.. cations 1 computer “programs . ons of . MEC e os nhi of on Trent Y, Z”, 
names of -persons in newspaper texts must remain A: | BM”, ete., if they contain any. of thè” personal, names 
. the domain of speculation. . MM RN found : in Y^ y^ y” ete.; C 
| |, We would conjecture’ that if the — of computation ge ow o. . .' and.so forth. ` i POS US 
. were:high.and its priee “could be. kept low, and if the ' ` This ipthinique lends itself easily to Various refinements, 
^ gus of. mérit coüld. be: raised: to .98 or higher, then a ` suos tB: Providing an automatic or “semiautomatic: service . 
computer’ prograin for identifying names of ' persons in. -' ` for answering questions. It would: be possible to list de 
.. i texts would be worth incorporating into existing informia- .' tain E of persons Wis — in articles in ra cer- 
š Ë ‘tain key terms, classes of ierms,;or- ‘strings O terms - 
tion retrieval By -— of pu large papers * and "occur with certain frequency. Programs such as' these ^ 
: - “periodicals: , | l ' “* , may perhaps be'helpful in providing answers, tocmis- ` 
(o However, although we DOE reasons to' think that ‘the "> céllaneous questions of tbe “Who?” type. - With! the . 
pa ‘of merit “could : eventually exceed 95, we have no’ > ` addition of routines for identifying dates, place names,- 


“evidence that it could ever-exceed .98: It i is still unknown - *. "street addresses, and so forth (and perhaps. also: for 


, parsing. sentences), a program. for identifying personál 
whether à program with a figure öf merit lower than: 98 titles and names ih texts may conceivably be ede 


would be useful irinformation retrieval. We would sur- .' into an automatic or semiautomatic’ service for ariswer-. 


` > . 
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ing some questions of "Who?" “Whose?” “Whom?” 
“To Whom?” “From Whom?” “By Whom?” “When?” 
and “Where?” types. 


Programs for automatic identification of names in texts 
may be useful to many groups, among them: (1) docu- 
mentalists, librarians, and editors concerned with extract- 
ing information from texts and with automating editorial 
practices, research practices, ete., (2) sociologists, political 
scientists, and onomasticians investigating the occurrence 
of names in texts, (3) opinion survey and market research 
statisticians concerned with the occurrence of names in 
texts, celebrity ratings, measurement of opinion trends, 
etc. 


* Discussion and Interpretation 


Áutomatie classification of words and phrases of the 
type described in this article can be regarded as a particu- 
larly simple ease of machine translation. Our algorithm 
attempts to identify and label certain types of words 
and word strings and to erase all others. In other words, 
the goal of this MT algorithm is text reduction: certain 
words and word strings are identified as “pertinent” -and 
others as “not pertinent”; pertinent words and phrases 
are ‘retained and labeled and all others are suppressed. 

Even this simple goal requires rules which are rather 
complex. However, because many word strings which 
the algorithms such as this one attempt to recognize have 
sunple structure (“phrase structure”), they can be gen- 
erated and possibly recognized with a reasonable degree 
of accuracy by a combination of linguistic and statistical 
techniques. 

It, seems that while researching and developing methods 
for automatic classification, it might be wise to set up 
efficient methods for manual classification of words in 
texts. In addition, it may be advisable to write programs 
for automatic extraction of data from texts which were 
recognized manually. Such action seems appropriate 
since automatic extraction of information from manually 
recognized texts would probably constitute a valuable 
service, and if and when automatic procedures for identi- 
fying dates, personal names, personal titles, trade names, 
company names, numbers and measure words, chemical 
formulas, etc. etc, become competitive with manual 
ones,| the data processing profession will be in possession 
of operational computer programs capable of extracting 
information from recognized texts.? 


* More generally, natural language can be viewed as macro-language 
compoged in part or in whole of various special-purpose mioro-languages 
—each' with its own structure which relative to the total structure of 
language is quite simple. It may be of some practical and theoretical 
interest to investigate (a) the grammars of various special-purpose 
micro-Janguages within natural lenguage, e.g., personal titles, personal 
names, dates, various seta of technical and scientific ‘terms, street 
addresses, trado names, place names, and (b) their structural and func- 
tional linterrelations; and to research and develop automatic procedures 
for assigning some words and word strings in computer-readable texts to 
appropriate special-purpose natural and artificial micro-languages (the 
boundary between the two is not clearly drawn; artificial languages 
shade into natural). 


It would seem that manual identification and classifica- 
tion of words and phrases in texts could be made to work 
efficiently and might provide a valuable interim service 
while methods for automatic identification are researched 
and developed. 

Of course, the future—or to be more precise—the long- 
term future seems to rest with the automatic identifica- 
tion of words and phrases in computer-legible texts; how- 
ever, more or less elaborate manual identification systems 


‘may have their moment’s shine now or soon. One such 


procedure would require visual displays, lightpens or 
cursors, and computer-legible texts (3, pp. 55-56). 
Another one may look approximately as follows: 

Clerks would sean printed texts (newspapers, books, 
journals, letters, memos, etc.) and locate various types 
of words and word strings (dates, names, etc.). 

Upon identifying & type of word or word string, a 
clerk would underline it and also tag it by means of some 
identifying symbol. Next, key punchers would transfer 
to punch cards both the tags and the words and phrases 
identifed by tags. Finally, words and tags would be 
transferred from cards to a suitable computer memory. 
Words and tags in large computer memories could then 
be processed by means of various cross-filing and tabu- 


_lating programs. Needless to say, at this time and for 


some time to come, the great efficiency of a computer 
will be in cross-filing and tabulating. 

Other solutions of this type, of KWIC type, and of 
related types could also be tested. 

An important problem whose solution could and per- 
haps should be attempted now is the conversion of tele- 


` type, typesetting, typewriter, and other keyed inputs to 


computer-legible form. It seems that much could be done 
about this dificult but nevertheless resolvable problem 
at the base (that 1s, at the publishing level) and that con- 
siderable technological advance toward text processing 
could reasonably be expected from an intelligent co- 
operation of the interested parties (documentalists, pub- 
lishers, administrators, hardware and software specialists, 
linguists, and others? What the prospects for such co- 
operation are we do not know, but we hope that some- 
thing can and will be done to make it a reality. 


If this investigation of written language which mixes 
syntax, semantics, and pragmatics has produced some 
interesting observations and leads to interesting practical 
results, this may constitute a case for other resolutely 
empirical and problem-oriented investigations of texts 
which (1) stress formulas and results, (2) shun gratuitous 
axiomatization, and (3) in which the study of written 
language is not factored out into subdisciplines but is 
their product. 


3 This was pointed out to me by Foster Mohrhardt in a personal 
communication. 
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is apparent that such coding would facilitate the retrieval 
and dissemination. of information associated with broad 
classes of fiction. In the case of a patron who liked all 
mysteries except those by Earl Stanley Gardner, a com- 
puter program could be written to search for “all ‘M’ not 
‘ESG, ” for example. 

Once written, the search logie corresponding to the in- 
terest profiles could be used either for retrospective 
searches or for searches of recent acquisitions. In the 
former case, this method would provide a bibliography 
of the library's holdings in the area(s) of interest; in the 
latter case, the method would be providing a current 
awareness or “alerting” service. For the retrospective 
searches it would seem reasonable to write & unique 
profile for each patron; however, for current awareness 
searches it might be better to economize by writing one 
profile for several patrons with similar interests. In any 
case, current awareness searches should be batched and 
run at intervals of time dependent on the rate of acquisi- 
tion for that particular library. If the library has a 
high rate of acquisition, the searches should be run more 
frequently than if the rate of acquisition is low. — . 

To test the feasibility of this method, IBM cards were 
obtained from the public library system in Lake County, 
Indiana. This particular library system does not have 
access to a computer, but it does have unit-record equip- 
ment with which it produces book catalogs. There are a 
little over 100 cards in the sample, and they were 
selected so that half are fiction and half are non-fiction. 
In addition, the non-fiction entries were chosen so that 
a variety of subject areas (and therefore DDC numbers) 
would be represented. À sample, one entry per line, can 
be seen in Fig. 1. ‘The data on each card are arranged 
by fields. ‘Starting at the left, one finds the classification 
number (minus the usual deeimal point), the author's 
surname, the author's initial(s), a title or an abbreviated 
title, an internal code showing which branch has the item, 
an abbreviation for the publisher, the last two digits of 
the year of publication, and the price of the item. 

The programs were run on the IBM 7040 computer 
at the Indiana University School of Medicine, and 
COMIT II was used as the programmong language. 
COMIT II, a second generation list-processing language, 
is especially well suited for handling symbolic alpha- 
numeric data, and its use in information retrieval work 
should be investigated more thoroughly in the future, 
particularly since it has been used almost exclusively in 
the area of computational Imguisties (1). 

Figure 2 shows a flowchart of the program which 
searches for non-fiction entries. Figure 3 shows the 
results of a COMIT program which will search the data 
base for items concerned with “the arts” and “the geog- 
raphy of modern Europe." Since cards are being used 
for input, the first action is to read the contents of the 
first card into workspace. If a card is found, the program 
has the computer check the classification number. If the 
classification number on the card matches the number 
which the computer has been instructed to find, then 


' the entire contents of the card are printed out, and the 


next card is read into workspace. If the classification 
number does not match, the information is deleted, and 
the next data card is read into workspace and examined. 
When no card is found, the program is terminated. In the 
example shown, provision should have been made to en- 
sure that the “7” and the “914” are associated only with 
the classification number. This can be done simply by 
using a null constituent to find the left end of. workspace 


-or else by revising and improving the format so that no 


other numerals are immediately preceded by spaces. 

Figure 4 shows a flowchart of the program which 
searches for fiction entries. The results of a COMIT 
program corresponding to this flowchart are shown in 
Fig. 5. Specifically, this program is designed to search 
the data base for all westerns (which are coded “W” in 
this collection) and all the works in the collection by the 
authors Druon, Hamsun, and Spark. In a large collec- 
tion it would be necessary to identify the authors more 
precisely, but using only the last name will illustrate 
the principle. The first rule which operates after a card 
has been read into workspace searches for an initial ^W." 
If there is a match, the entire contents of the card are 
printed out, and the next card is read into workspace. If 
there is no match, the rule fails and contro! goes to a rule 
which finds the author's name. At this point in the pro- 
gram, all information other than the author's name is 
shelved (ie., stored temporarily in a different place in 
core memory), and an internal “dictionary” is searched 
for a match with the author’s name. If a match occurs, 
the contents of the shelves are called for, put in their 
proper positions relative to the name of the author, and 
the entire set of data is printed out. If no match occurs 
in the dictionary, the contents of the shelves are called 
for and are deleted along with the undesired name. Con- 
trol then goes back to the first rule, and another card 
is found or else the program is terminated. 

Since COMIT II operates with an interpreter as well 
as & compiler, & listing of the program rules and a 
terminal dump are obtained when each program is run. 
In addition, notification of a successful or bad compila- 
tion is provided. Both programs described here took 
about 30 seconds to process and run, and each cost slightly 
more than a dollar. Since time and cost do not increase 
linearly with the amount of data, these programs ap- 
parently represent an economically feasible method for 
retrieving this kind of information. It should also be noted 
that the time and cost figures include the time required 
to provide the listing and the terminal dump. By sup- 
pressing these two operations and by using magnetic tape 
for input, one should be able to make this retrieval system 
even less expensive. 

À. program of the type described in this article can be 
adapted to a variety of situations. The choice of the 
material to be disseminated to patrons should be deter- 
mined by each librarian in light of the needs of his com- 
munity (2). Moreover, it seems apparent that the use 
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COMIT 7040/44 SYSTEM UNDER IBSYS OR IBJOB VERSION 1LJULY66 


880002 


+ 


COM DAVIS O01 LIB 2 
READ $=//%RCK1 ** 
* STOP ; 
* $0*Wt$-—*2*34*.//*54AM1 2 3 4 READ 


' A $0+$8+$4—-+$=24+3+4+5//%Q23 1, *Q33 3 4, *L2 DICT 


| -DICT HAMSUM= B 
| SPARK= B 
DRUDN= B - 
* $=14+$04+$60//#A23 2, *A33 3 x 
* $=0 READ 


B $-—t$0*1*$0t*.//*A23 2, *A32 4, *HAML 2 3 4 5 READ 


STOP >œ 
FND 


SUCCESSFUL COMPILATION, WORKSPACE CONTAINS 18668 REGISTERS. 


DRUON M SHE WOLF OF FRANCE 
! HAMSUN K GROWTH OF THE SOIL V 1 
HAMSUN K GROWTH OF THE SOIL V 2- 
WC BURNETT WRADOBE WALLS 
E KREPPS RHSAMBLE MY LAST GAME 
W HUNTER J DESPERATION VALLEY 
SPARK M MANDELBAUM GATE 
SPARK M MANDELBAUM GATE 


18468 REGISTERS OF THE WORKSPACE WERE UNUSED. 


COMDUMP OF CHANGED DATA AFTER 292 RULES. 
THE WORKSPACE IS EMPTY. 
SHELF 23 TS EMPTY, 
SHELF 33 IS EMPTY. 


302 SCRI600450 
108 KNO 210200 
108 KNO 210200 
3A2 KNOP53030 
142 MACM58032 

308 MACM6403 

307 ` KNO 65059 
308 KNO 650595 


| Fic. 5. Representative fiction search 


of technical services to support reader services is an area 
which publie libraries can and should continue to explore 
actively. 
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"The: Japan. Information: Center of Science aud Tech- 


“Snology (JICST) was: established” in 1957, with initial. - 


. funds from the Japanese Government: ‘and private in- 


+ dustry. The organization now has a full-time staff of 


- 
, 


; nearly 260 people of which 25: percent consists of. 


. 
A 


E 7 subject specialists. In addition, fhere are móre than 


2, 000 outside cooperators. for abstracting and trans- ` 
*'lating.- JICST's- services. include current bibliographie: 


` publications, photocopy, current content-sheet, trans- 
lation; and literature search. The charges for these’ 


services give” ‘partial support for the ‘activities of JICST:: | 


ñ "JEIPAC” is a special ——— electronic. computer 
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e Establishment and Purpose: 
= The: — veo Center of Science gud Tech- 
A _ molog (JICST).was established in August 1957 according - 
"to the JICST Act (Law. 84) and defined as “a central | 
‘organization in the country ‘for scientific aiid technical 
information." It is a nonprofit institution founded upon 
". government and industrial contributions of 80 million yen - 
(about $222,220;. 40 million - yen from. the government ` 
and another 40 million from industry, i.e.; firms as share- 
"holders. The ‘idea for- such an ‘organization had been. 
`. developed: since the Science. and Technics Agency : “was 
established in the Japanese government in 1956. For the , 
; development; of research and study: in science and tech- 
' nology in- the country, information. control and handling . 
- have been & major problem inagmuch.as Japan -is geo- 
5 | Graphically rémote- from the Western : countries and 
language: problems are a > bartier to. scientific communica, 

"on... 

The Scierie ind Téchnies Agency started to study the. 
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+ Now: doing — study at Oolumbia University School of Library 
Service 282. fellow of the T Medical. viis ot TM Fors š 
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. designed foi JICST's informatión handling, which, Was 
installed, in. 1961. JICST has been engaged i in devel- 
opiñg ` information retrieval systems in several: subject . 


-> areas by using this: machine and it is being used in 


practice for metallurgical, literature search: Although. 
JICST is. Japan's central organization for the dissemi- 
nation of scientific.and technical. information, its ser- 
vices: do: ‘not cover the fields of the vife. sciences. be- 
cause of economic limitations. The present services: of 
'JICST. are mainly «concerned with foreign literature 
relating to me physical sciences... Fr TE oE 


TAKAO ¡FUKUDOME t SLE EOM S 
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. Kitasato Memorial Medical Library, 

- Keto University School of Medizin, I "PI 
Tokyo, Tonus ^. a: E 
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' —— dom T viewpoint of — iterat and me 


asked ‘the-government and industries to’ support a new 
‘enterprise for scientific and technical ' information. -The 
effort resulted in the “Japan Information Center of Sei- 
ence and Technology, Act,” passed by the Diet (Congress) 


^s dmn April: 1957, In order to promote the developmént of 


science and technology in the country, me function: oi 
JICST i ig defined as follows: ac! Se 


1. To dicot both — foreigi — in 
. ' the field of science and technolo 
. 2.. To classify, organize, and retain that information; 
8 To disseminate: that information to. its clientele 
quckly; `“ 
. &.. To solve problems of information: handling. that i 
/ + "individual institutes or — are not’ able to. 
| manage. | | | GEE 


t 


E 9 Organization and Staff E: y OR 
JICST. is controlled by the Prime Ministér. — 
- the Science and Technics Agency, Japanese government. 


The. De and auditor s are nominated bad the Prime z 


Minister, and the vice-president and director(s) are 
nominated by the president and with the consent of the 
Prime Minister, under the JICST Act, Article 18. There 
are now 13 advisors and 40 councilors who.are representa- 
tives of learned societies, universities, and industry in 
the country(1). The principal chart of the organization 
ERN in Chart 1. 
I 


Cuart 1. Organization of the Japan Information Center of 


Science and Technology 
General Affairs Division 
Planning Office 
Advisors Information Division 
Councilors Documents Division Retrieval Section 
Library 
Presideni 
Vice-President Service Division Publication Section 
Director(s) Photoduplication 
Š . Section 
Auditor(s) Translation 
Section 
Investigation 
Section 
Osaka Branch 
Nagoya Branch 


| The Information Division is mainly concerned with 
editing Current Bibliography on Science and Technology 
(Kagaku Gijutsu Bunken Sokuho) (vid. IV-A). The 
Retrieval Section, Document Division, is engaged in in- 
dexing Current Bibliography, producing “information 
eards” (vid. IV-B), and experimenting with machine 
search methods. The investigation section of the Service 
Division offers literature search, patent literature search, 
and abstracting service on request from outside users 
who pay service fees (vid. V). 

In the first year, 1957, JICST started its operation 
with 62 staff members; the number has been expanded to 
258 (1, p. 2). More than 25 percent of the total staff 
have subject specialties, with at least baccalaureate de- 
grees in science or technology. (In 1963, 90 of 230 staff 
members were subject specialists[2].) In addition, there 
are now about 2,200 outside cooperators for abstracting 


and translating. JICST also has two service branches in , 


¡Osaka and Nagoya which are both important industrial 
‘areas in Japan. 


* Acquisition and Collection of Materials 


Foreign (outside Japan) current journals and patent 
specifications are the most important mformation sources 
in JICST. The Center received 4,135 current foreign 
journal titles and 1,729 domestic journals in 1964, and 
the number of foreign titles was predicted to increase to 
an estimated 4,300 in 1965 (1, p. 4). About 1,000 titles 
of foreign journals are obtained weekly by air-cargo 


through agents in Dusseldorf, West Germany, and in 
New York. 

Distribution of the foreign current journals by coun- 
tries and by subjects are shown in Table 1 (3). 

Patent specifications are obtained by alrfreight from 
the United States, England, and West Germany, and 
they are limited to the subject of chemistry only (vid. 
IV-B). About 25,500 items of the patent literature were 
acquired in 1964, and 27,000 items were estimated for 
1965 (8, p. 5). The collection of books and monographs is 
not as large in number. It consisted of 5,400 foreign and 
3,940 domestic titles at the end of March 1964 (3). 


e Publications 


A. Current Bibliography on Science and Technology 
(Kagaku Gijutsu Bunken Sokuho) 


Current Bibliography is an abstracting journal that 
started publication in March 1958. It is now divided into 
ten series covering more than 4,000 foreign current 
journals. It contained a total of about 300,000 items in 
1964 (3, p. 6). 


1. “Chemistry & Chemical Industry series." March 
1958- 3/m. (Including foreign periodicals only: 94,700 
items in 1964.) Compared with Chemical Abstracts, 
the series places greater emphasis on technical mterpre- 
tation of new products in the field of chemical industry. 

2. “General Engineering & Mechanical Engineering 
series.” March 1958- s-m. (Including foreign periodi- 
cals only: 46,500 items in 1964.) All articles in such 
important publications in this field as ASME and SAE 
publications are abstracted, and the main classes in 
the series are ar ed according to the conventional 
classification of industries for the convenience of 
subscribers. 








TABLE 1 
1. Country 

The United States | 32% 
Great Britain 18% 
Germany 15% 
France 8% 
USSR. 6% 
Italy 3% 
Others 18% 

100% 

2. Subject * 

Chemistry and chemical industry 27% 
Electrical engineering 15% 
Mechanical engineering 15% 
Civil engineering and architecture 8% 
Geology, mining, and metallurgy 796 
Pure and applied physics 696 
Atomic energy 6% 
Others 16% 

100% 


* Biological sciences are not covered by the JICST services. 
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3. “Electrical Engineering series." March 1958- s-m. 
4) foreign periodicals only: 28,500 items in 
1964 

4. “Geology, Mining, and Metallurgy series.” Sept. 
1958- s-m. (Including foreign periodicals only: 28,200 
items in 1964.) This series especially emphasizes Soviet 
literature, which covers 30 or 40 percent of the total 
items in the series. The abstracts of Soviet literature 
are also longer and more detailed than those of other 
literatures because of language problems. 

5. "Civil Engineering & Architecture series.” Sept. 
1958- s-m. (Including foreign periodicals only: 18,600 
Items in 1964.) In addition to 250 professional journals 
in this field, about 600 other journals received in 
JICST are screened for this series. 

6. “Pure and Apphed Physics series.” April 1956- m. 
— foreign periodicals only: 30,200 items in 
1 

7. "Atomic Energy—Radioisotope and Radiation 
Application series.” April 1961- m. (Including both 
Japanese and foreign periodicals and reports: 10,000 
items in 1964.) The abstracts in this series are usually 
longer than those in the other series; and tables, charts, 

. or formulas are attached, if necessary. Articles in the 
field of nuclear medicine are not included. Japanese 
CST. are screened from about 1,000 titles received in 

8. “Business Management series.” April 1963- m. 
(Including both Japanese and foreign publications: 
12,300 items in 1964.) 'lhis series covers such subject 
areas as operations research, systems engineering, in- 
dustrial engineering, human engineering, quality con- 
trol, etc. It does not include such topics as economic 
statistics and federal or state economics. 

9. “Chemistry in Japan—Japanese Chemical Ab- 
stracts.” January 1964— m. (Including 28,900 items m 
1964.) About 500 titles of the 1,000 Japanese journals 
screened for the series are publications specializing in 
the chemical field. The 500 journals and Japanese 
patent literature in chemistry are extensively ab- 
stracted in this series. 

The Japanese Chemical Abstracts (Nippon Kagaku 
Soran) itself had been published by the Japan Chemi- 
cal Research Association since 1877, and was absorbed 


as a series of Current Bibliography by JICST in 1964. - 


During 1958-1959, JICST had already published the 
General Index to the Japanese Chemical Abstracts 
1941-1955 (Nippon Kagaku Soran Sosakuin), which is 
a 15-year author and subject index to the abstracting 
journal. 

10. "Foreign Technical Information for Smaller En- 
terprises series.” Sept. 1965- 3/y. One of the real prob- 
lems in Japanese economic and industrial development 
is to modernize and orient the management of medium- 
or small-sized enterprises which constitute the majority 
of Japanese industry. They do not need scientific or 


highly specialized research information, but they do ' 


need more practical information directly concerned 
with their own products. This series is trying to find 
new customers for the Center's services. 


In addition to the abstracts, each series, except series 7 
and series 9, has a "news section" In its issues. About 100 


titles of periodicals of general science and technology such : 


as Science and New Scientist, of trade journals such as 
Business Week and Europachemie, and of trade news- 
papers such as Financial Times and VDI Nachrichten 
are screened for this section. 
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B. The Compiling Process of Current Bibliography 


The main duties of the Information Division (vid. 11) 
are to select each article for abstracting, to check the 
abstracts returned from outside cooperators, and to 
classify them and assign UDC numbers. The Division 
staff consists of information specialists engaged in their 
own fields in the categories of the Bibliography series. 
When the three series of the Bibliography were first pub- 
lished in the spring of 1958, their contents consisted of 
the translated titles and one- or two-line annotations. 
In the autumn of that year, the annotations of the “Elec- 
trical Engineering" series were extended to “indicative” 
abstracts having about 150 characters. Abstracting by 
outside cooperators also began at the same time. In the 
beginning of 1959, the indicative abstracts appeared in 
all other series. Since 1960, “informative” abstracts have 
been developed with about 300 characters. Japanese is 
written with & mixture of Chinese characters and Japa- 
nese syllabary (MANA). In general, a Chinese character 
can constitute a word; therefore, a 300-character abstract 
can contain much more information than one having 
the same number of Roman characters in English. The 
writing system, however, creates difficulties for mechani- 
zation because of the numerous numbers of Chinese char- 
acters and their complexity. 

The compiling process of Current Bibliography can be 
summarized in the following nine steps (2, 4): 


1. Newly arrived materials: are delivered from the 
Library to the Information Division. 

2. Each article of those materials is screened for the 
Bibliography by the Division staff. 

3. The selected articles are sent to outside coopera- 
tors for abstracting. The contracted abstractors and 
translators now number about 2,200 specialists; more 
than 80 percent of them work for universities or re- 
search institutes and the rest are staff connected with 
private enterprises. 

4. The abstracts sent back from the outside coop- 
erators are checked by the Division staff. The staff is 
responsible for the quality of the abstracts, uniformity 
of the terminology, and consistency of word usage. 

5. After checking, each abstract is classified accord- 
ing to JICST’s own classification scheme. UDC nota- 
tion is also assigned, since the UDC is widely used in 
industrial libraries in Japan. It is said that using UDC 
on such a large scale of bibliographic control is compa- 
rable to that of the Referatiuny Zhurnal of the Soviet 
Union. At the same time, indications are made on the 
items which are entered under more than one category 
in the series. 

6. The abstracts with the classification code, the 
UDC number, and the indication for multiple entries, 
if necessary, are forwarded to the Retrieval section to 
type “information cards”-—each abstract is typed on 
a 4x6 card. (The abstract itself also has to be written 
within the limitation of the card space.) The cards for 
multiple entries are copied by Xerox 914 m the Photo- 
duplication section. 

7. The "information cards" are returned to the In- 
formation Division, and the duplicated entries are dis- 
tributed to the appropriate series sections. Checking is 
done for typing mistakes and for correctness of the 
classification and the UDC number on the cards de- 
livered from the other series sections. 
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8. The finished information cards are sent back to the 
Retrieval section again, and those cards are filed ac- 
cording to classification and are assigned series item 
numbers. 

9. The filed cards are sent to the Publication section 
for printing by a photo-offset method from full-card 
layout format. 


As seen above, all the processes have been done by 
‘Manual steps. However, the time-lag between the date of 
original publication and that of the Bibliography is an 
average of three months (2, 4). After the printing, the 
original cards (information cards) are cumulated in the 
form of a card catalog which contained a total of 900,000 

. cards in 1964 (2, 4). 


C. Other publications 


. 1. Foreign Patent News—Chemistry (Gaikoku 
| Tokkyo Sokuho). April 1958- w. | 
(U.S.A.), the Official Journal (Great Britain), and the 
Patentblatt (West Germany). It includes the follow- 
ing information: patent number, classification, title 
of the invention (translated into Japanese), appli- 
cant(s), inventor(s), application number, and appli- 
cation date. In the first year, 1958, the News was 
published in three editions for chemistry, electrical en- 
gineering, and mechanieal engineering, but its coverage 
has been limited only to chemistry since 1959. The 
atent specifications in the three countries are obtained 
iby air cargo. The chemistry section of the US. Offi- 
icial Gazette is now fully copied with the permission 
of the American Embassy in Tokyo. 
a Patent Index. (Nippon Tokkyo Sakuin) 
a. 
This is an annual index to the Japanese Patent Gazette. 
The index consists of an “Applicant index” and a 
“Classification number index.” The former is divided 
into the following three parts: “Japanese corporate 
names,” “Japanese personal names,” and “Foreign ap- 
plicants.” «It included 19,000 patents in the 1962 edi- 
tion, 26,950 in 1963, and 30,380 in 1964. 

3. JICST Monthly (Joho Kanri). January 1958- m. 
This publication was originally a house-organ of JICST, 
but it has now established its place as a representative 
professional journal of documentation in Japan. The 
status of the journal could be compared to that of 
American Documentation in the United States, or 
Journal of Documentation in England. Its Japanese 
title was changed from Gekkan JICST to the present 
one in 1963. 


e Services on Demand 


There are four types of services on demand: photo- 
copies, current content-sheet service, translations, and 


“literature searches. These direct requests from users have ` 


a close relation to Current Bibliography and other publi- 
cations of the JICST itself. In other words, the demand 
services have been produced or accelerated by the publi- 
cation service. 


A. Photocopy Service 


The photocopy service is now the heaviest business 
aside from the publication of Current Bibliography itself. 


H 
2 


- This publication is compiled from the Official Gazette | 


In the fiscal year 1964, JICST filled more than 267,000 
requests for photoduplication (1, p. 10). This means that 
about 20,000 articles or 130,000 to 140,000 pages were 
supplied by the service each month (2). It is estimated 
that more than 80 percent of the total requests were 
generated from the bibliographie publications of JICST. 
The requests for photocopies, therefore, have been 
analyzed as & tool to measure the user value of the 
publications. For example, a recent survey was made of 
the relation between photocopy requests and the “Me- 
chanical Engineering series” of Current Bibliography. It 
is a three-month survey of 9,193 requests from March to 
May 1964 (6): 


1. The items of photocopies requested in a particular 
time ‘corresponded with those in a particular issue of 
the series; that is, the majority of the requests repre- 
sented the items in an issue which had been published 
about 50 days before. 

2. There were 70 titles which had more than 30 
requests. Some of these were Transactions of ASME, 
SAE Transactions, Machine Design, Metalworking Pro-. 
duction, Design News, Machinery, Maschinenbautech- 
nik, Mass Production, ete. 

3. English, German, and Russian, as represented by 
the requested literature, show a ratio of 10 : 3 : 1 
whereas the languages in the series of the Bibliography 
have the ratio of 10 : 4 : 2. Hence, it can be said that 
Russian is not yet as popular a language in industry 
as English or German. ' 

4, There were 245 articles which had more than four 
requests. The contents of those articles were mostly 
practical rather than theoretical approaches. 

5. There were a great many requests for short arti- 
cles, less than a page, which contained news concerning 
new products. This fact should be taken into considera- 
tion in the selection process of articles for Current 
Bibhography. 


B. Current Content-Sheet Service 


“Tables of contents” of journals requested by users 
are supplied in photocopies before the materials are 
processed for abstracting. This is a current awareness 
service preceding Current Bibliography. The service be- 
gan in October, 1959, and the photocopies of tables of 
contents are delivered to the customers three times a 
month. This is not an exclusive contents service like the 
Current Contents published in the United States by the 
Institute for Scientific Information, but it is a selective 
service according to the users’ requirements for particular 
titles. | 


f C. Translation Service 


Most, of translations in the serviee have been done by 
outside cooperators who are able to contract with JICST 
as well as the abstracters for the Bibliography. There- 
fore, almost all foreign languages can be translated into 
Japanese by using the service. Translations from Japa- 
nese to foreign languages, however, are limited to English, 
German, French, and Russian. The requests received had 
risen to 9,680 in 1962, but the number has gradually 
decreased to 5,424 in 1963 and to 4,819 in 1964 (1, p. 10). 
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E 


In tho fiscal year 1961, est. ded 5,022 Fequests. for = 
translations. -The distribution. by ea 18 shown. in: 
.. Table 2 (4, p. 16). 


Fifty-four percent ot the: 5, 022. requests were dor trans. ` 


- + lations of the literature abstracted in Current Bibliogra- 


phy. Others were for materials. outside the Bibliography, - 


including patent specifications; standards, product cata- T 


logs, correspondence, eio. — | 1 
There is a. great feed dor — of J apanese 


scientific and ‘technical ‘literature outside. Japan: For. 
example, INSDOC m New Delhi has ‘found some diffi- 


‘culty in obtaining translation ' service for Japanese . and 


. Chinese literature (6). When Dr. Mohajir, Director of: 
" PANSDOC: in Karachi, visited JICST in 1962, he pro-- 
— an exchange plan for translation (4, p. 35). -But 
neither institution could reach-an agreement on the cost ^ 
of translation. ‘The translation, service.in PAN SDOC has. 

. been done by the full-time. staff and offered. ata very small 


charge. On ‘the other hand, any JICST servic has to 


cover the» ‘expended. cost because of its organizational ` 
character as a self-supported institution (vid. VII). How-.* 
^: ever, there have been 30 to:50 requests a year for transla- ` 
tions from foreign: countries, and these s been PD 


rai s E oa ee | — 


A. 


^D. Literature Search Service 


T, 


Cortelations have — found ——— the bibliographie 


"publications -of JICST, “photocopy requests to JICST, 


; and translation requests. ‘The pattern of thé JICST users” | 
— approach. to information is usually limited to requests 
. ‘arising from the publiéations.. However, &. demand. före s 
, literature ‘search. by subject may arise separately.. "These 
“` demands Are usually: genérated ‘by individual users’ own | 

. production’ problems. In MANY -cases,, therefore, these ` 
literature. searches will. concern.. confidential matters of . 
individual enterprises. - "They are. afraid that their indus- 
“trial securities might be violated by using the literature - 


.:'Bearch service. Hence, such : questions from industry’ fre- 


quently come in as very. broad topics’ or aré obscure” in 


` the ‘subject contents. When there i; is a lack of communica- 
tion between the questioner and the search staff, the re- 


., sult is unsatisfactory’, It is important for the service to." 
' establish a — and to gain the confidence ‘of, its i 


1 


Tania 2 est 
‘Translation Requests _ 
. Original- Translation Languages ME 
Russian—Japanese » 33.5% > 
r P . ` German—Japanese ç 178%. . 
"15 ;^Japanese—English y + 3309 ` 
,. English—Japanese 12.896 
tl French-—Japanese I . 123% ^ . 
^: Others—Japanese ' 98% ^ | 
. .. +" Japanese—Otbers 0.9% 
^ = Ni vaya — y 
qo n — 100.0096: 
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`° "Mochaiiation of Information Handling : 


r 
SF 


users that they may rely. on the discrétion of the staf in 
matters relating to industrial security. , 
The. literature search. -service of JICST dici * — 


. the fields ‘of medicine, agriculture, and biology. The staff — 
^- and tools for the service lave been well prepared forthe ` 


" fields. .of cliemistry, pharmacy, ` electrical: engineering, n 


. mechanical engineering, and metallurgy. Exclusive search i 
p for. pátent applications 3 18 also available. from this service fg 
-. ` section of JICST. E - 


It is. expected that the literature — gervice will de | 


*. rapidly developed when the: computerization of informa- ` 


„ tion’ retrieval system in the c comes into pros p 
ue ud 


— 


T. ! ` ; 
Do are - two approaches’ to. the mechanization of 
information systems in JICST. One is the mechanization 


.. of the compiling process of Current Bibliography, and 


‘another is the information retrieval by à computer system '^ 
for’ the literature. search, For the first purpose; ‘IBM © 


i - equipment was installed in April 1960: 24 and 26 Print- 


ing Card Punch, 56 Card Verifier, 82 Sorter, and 853 Card . 
. Type; (2). "They, have been since ‘employed for the tom- 
. piling of the author index to Current Bibliography, Japa- 
nese Patent Indez, ‘periodicals holding lista, and various 
statistical. reports. A “listcamera” is now being developed 


` through the cooperation of JICST and Tokyo Micro Co, ` 


Ltd.. The newly designed listcamera is intended to copy - 
the subject headings: and titles from the "information >: 

cards”. (vid: IV-B) -for -compiling; a. subject indéx to" 
Current Bibliography. Some disadvantages of the camera 


. have not.been solved yet: for example, cards have to be- .. 


inserted: into the machine, by hand, 80 their headlines 
. come’ out irregularly (2). EE = 
The so-called. “JSIPAC” (TOSÉAO 4131). is a: — 
purpose. electronic computer designed for the JICST's 
"information Handling which has been used experimentally ie 
- since 1961. The processing operation includes’ sorting, 
collation, adding, subtraction, and automatic “printing. 
The ‘machine ability is explained in the. following (7): | 


The, storage medium is magnetic tape, and, four. tape `~ 


.reels are standard. A file of documents is entered: on P, 


paper tape... and transferred onto the magnetic 
tape through a high-speed photo, reader. Each mag- . 
netic tape reel may include about 2,400,000 characters. 
. One machine word contains 12° characters. Taking 10 
^ ag an average number of machine words per. document, 


20,000 documents can. be stored ona tape reel. "Tape. i 
scanning speed is.1.5 meters per second, or 9,000, char- -. 


- acters per second. As a memory device, a 60-word .:: 


magnetic core memory is installed;. 25 worda for the `` 


. data to be processed; 15 for the question data to be ` 
searched. Then up to 20' words can be reserved for : 
processing other than searching. * 


The- «coding required for an information retrieval sys- 


-tem ‘has been studied by each subject specialists’ group © 
in the Document. Division. The experiment on metallurgi- . | 


cal. literature i 18 now: w becoming Da to & — step: 


iv 
7 - ' i A , ry » ; x 
4 
£ 


A. The Experiment in the Metallurgy Section (8,9, 10, 11) 


The classification system developed for metallurgical 
literature by the American Society for Metals and the 
Special Library Association (12) has been studied and 
modified for the coding of the experiment. The experi- 
, mentation by this group has been done in the following 
steps: 

: 1. Sampling of key-wonds from the "Geology, Mining, 
etallurgy series" 

2. Establishing of the coding system 

3. “Recording bibliographic citations 

4. Punching the paper tape for transmitting the 

information 

5. Storing the information into the magnetic tapes 


The 10,392 items in volume 4 of the series already have 
been stored on the tapes; the 13,189 items in volume 5 
were to be stored by October 1965; and the 13,025 items 
in volume 6 by February 1966. Then the total of 36,606 
items could be employed for the literature searches in 
practice (13). 


B. The Experiment in the Chemistry Section (14) 


The most difficult problem of information retrieval m 
chemistry is a coding system for the chemical structures 
of organic compounds. The notation by the. Interna- 
tional Union of Pure and Applied Chemistry (15) and 
the information retrieval system for steroid compounds 
of the United States Patent Office (16) have been studied 
for the coding of chemical literature in JICST. A hand- 
sort punched cards system was employed as the first ex- 
perimental step, and the result has been transferred into 
the “JAIPAC” system. The information is to be categor- 
ized according to the four facets: “Starting material,” 
“Type of reaction,” “Products,” and “Object of reaction” 
(2). 


C. The Experiment in the Electrical Engineering Section 


The study by this group concerns the general or basic 
problems for information retrieval or coding with se- 
mantic coding approaches. 1t attempts to formalize the 
two semantic relations; relation between a term in the 
field of electrical engineering and its object, and relation 
between the terms in the field (4, p. 12). It is expected. 
that the result would be useful not only for coding the 
literature in the field, but also for an automatic conver- 
sion system of symbols in general. 


* Conclusion 


The function of JICST is to smooth the dissemination 
of scientific and technical information as a central organi- 
zation in Japan, and its activities have been expanded 
since its beginning. However, the financial status of the 
organization has forced it to limit its activities to some 
extent. JICST is not an entirely governmental institution 


becduse half of its initial fund was collected from com- 
u l 


mercial industry. Its legal status is that of a so-called 
“special corporation.” Although it is a nonprofit institu- 
tion, it is expected to expand self-support operations by 
its own business income. As it is impossible for JICST 
to operate with its business income only, annual contribu- 
tions and subsidies have been given by the government 
since its beginning. For example, the fiscal year 1965 
income budget is estimated to be nearly 800 million yen, 
and about half of that amount comes from government 
funds (1, p. 3). 

Consequently, the services of JICST have emphasized 
meeting demands from industry rather than those from 
academic fields. There are now three major limitations 
to the JICST’s services (17): 


1. The fields of agriculture, fishery, biology, and 
medicine are excluded from the information handling 
systems because the JICST's subscribers largely con- 
sist of industrial companies whose fields of interest are 
in physical sciences. 

2. The foreign patent information is limited to chemi- 
cal subjects in the United States, Great Britain, and 
West Germany. 

3. Japanese literature has not been included in 
Current Bibliography, except the two series of “Busi- 
ness Management" and “Chemistry in Japan—Japa- 
nese Chemical Abstracts.” 


In the field of medicine, however, the Japanese Medi- 
cal Library Association (founded 1927) has developed 


its interlibrary loan system and the union lists of medi- 
cal periodicals. As a central abstracting journal for Japa- 


. nese medical literature, Japana Centra Revuo Medicina 


(Igaku Chuo Zasshi) has been privately published since 
1903. A coordinated plan for the indexing of Japanese 
literature for the Index Medicus is now under discussion 
between the Japan Medical Library Association and the 
National Library of Medicine in the United States. In 
the field of agriculture, there is a new movement to estab- 
lish a nation-wide organization which is supposed to be 
called “Japanese Agricultural Library Association.” But 
its activities in the field are not yet clear. 

JICST is legally under the control of the Science and 
Technics Agency to the degree that its funds are received 
from the government, and it is functionally coordinated 
with the Japanese Patent Office, the National Diet (Con- 
gress) Library, and the Science Council of Japan. The 
distinctions between the Science and Technology Division, 
National Diet Library, and the JICST are not always 
clear to the public in terms of their acquisition policies 
or their bibliographic organization activities as national 
institutions. Internationally, JICST is an associate mem- 
ber of FID, and the FID 1967 convention will be held in 
Tokyo under the auspices of JICST. | 

À new building for JICST is being built, and it will be 
completed by May, 1966. The new building was designed 
with the estimation that the current periodicals received 
in JICST would reach 10,000 titles and the annual ab- 
stracting would surpass 450,000 items in the near fu- 
ture. But the stack space has eight years' capacity for 
the collections, because it is estimated that literature 
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Ñ more than sight ‘years’ "old would become 1 A used mate- . 
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> Framework for Comparing Term Association Measures” 
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some . particular application is’ reviewed. - Two 


methods are presented for treating various measures 


in a common framework—a parameterized model. 
and a graphical interpretation.of the measures, Some 
association measures which have been suggested to 
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| Many’ statistical. — almost all of theta based: -` 


on the single theoretical foundation offered by. the 2x2 
contingency . table, have been introduced as “candidates 
for measuring the degree, Aab, of association between 


two index terms a and b: Generally speaking, no firmi: 
relationship between the. choice of association measure - 
‘and its effect upon evaluated . performance has been `, 
established. In part this has occurred because no oppor- 

tunity for large scale in-use "tuning" of the measures to: 
the specific’ needs of an operational retrieval system 


. The problem of choosing an | association measure: for . 


has been pursued to an experimental. conclusion, The . 


judy of these measures has tended to stay in a research . 


: context, and few practice-oriented appraisals -have beeri ie 


made. Nevertheless, a .few comparative side-by-side 
appraisals of the effect of using different formulas have 
been conducted in pilot experiments! While clear-cut 


preference for one formula. over another. (because it is". 


` al better discriminator of. terms judged to be related) 






in laboratory: tests has been valuable.. 


| Not surprisingly, each formula has been. found to. have = 
‘some attributes and some deficiencies. Apparently, each 


pe does provide, m n practice, a set of associated . 
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tems Command, Electronic Systema Division, Decision Sciences Labora:, 


tdry, under Oontract No. AF 19 (828)—8811 with Arthur D. Little, Ino. 
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not’ emerged from the experimental tests 80: far E 
r “ported, the insight and experience that has been gained 
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. date are ‘discussed i in Farms of this framework, and an 
"example. is. chosen. from: the NASA ` vocabulary. 


| Qualitative. features ' of the list of associated terms | 
_are related to properties of the measure used, and 
it is suggested that this characterization can be useful 


in choosing which measure to use. NE pim © 
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terms among which there are many oao bias ones, 
What is.annoying is that no clear-cut criterion for choice 


among: the alternates has emerged... As a result, few 
candidate. measures have been permanently dismissed 
from. consideration, and 8. rather large set of | formulas 


.- remains ‘available. 
The argument behind a typical —— measure; 


when developed along statistieal or theoretical lines, offers 


"little or no basis for distinguishing among them.. "The 
‘reasoning suggests comparing the number of observed 
-eo-oecurrences- with the calculated number of expected ` 


co-occurrences. Givén two formulas, it will generally be: 
found that substantially this same supporting rationale 


‘ig proffered for both; there are many ways of measuring 


statistical surprise or the unexpectedness of an observas" 
tion,” and a large number of the available formulas ean. 
responsibly claim to do so. Figure 1 exhibits some of the 
more familiar measures and records their theoretical - 


| interpretation or rationale. ` 
-The fact that a large number of a — appar⸗ ipe. 
ently survived the efforts of critical researchers to select ` 


among them is'a curious problem which faces the serious 


student of associative retrieval, There.definitely are dif- 
-" ferences. in how various formulas behave. But‘ choosing 
' which is “best,” evén undér stated conditions, is a prob- 
lem ; which’ has only rarely been approached. In . this. 
. ` paper we develop some ‘of the tools we found helpful for 
comparing ranked: term: listings (profiles). produced by 


theu use of-term association measures, in pene: "bg od 
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If a "aid b are ondas t terns, tallles of nunibers of documents indexed E not indexed) by 
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the « combinations of a and b are revealed in the 2 x contingency table: 


f 
F 








‘not a - Total 






b um 
not b N—fa—fb +fab 
Total ^ "fa. 10v Nefa AN, 


` 


where fa and fb are the frequencles of terms a and b. respectively; fab. la the Menos: 


of co-occurrences of a and b ,and N is the collection: ‘size, 


Various measures based on this table are: — " 
DC NE 2 Saba C . i 
(1) -Aab = — f = 
T Wer ^ ^ document, pne term a is also assigned. 


< — 
1 
f š š z 


à "^ andthe expected number pared on chance. 
a oe fabN — fáfb ye 
e hab = Lo (| "|: = a M ur The chi square formula using 
"T marginal values of the 2x2 ` 


Fath (N -fa)(N -fb) ` u table and Yates: correction 
PS X _ ` + for small samples ` 
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(IV). Aab = fa DR fab "The number. of. co-occurrences normalized by the i 
EE) T number of documents indexed by only one of the 
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The conditional probability given get term b. is assigned toa 


The difference — the observed number: of. co-occurrences A 
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" Equivalent ——— Measures | 


` In —— ihe use of one of de available statistical 
association» measures serves two purposes. The first is 


to: select, for a given header term, a list of associated ` 


terms, a process typically accomplished by specifying 
8 threshold for the measure above which terms count 
ag í “associa 


between thé associated terms and the header. A con- 
-venient way to portray the result of applying the associa- 
tion measure to the: data is to rank the co-occurring 
terms in a printed list, displaying the terms in decreasing 
order of thé association measure being. used. The order 


Fia: 1.. The derivation. of association e d ud dies 2»x2 contingency table 


^ The second application is to provide ° 
a ——— measure of the degree.of association ' 


in which the terms are presented on- puet a “profile” 
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exhibita ele one term is dim» «associated with the 


header: term than another term is. 


In attempting to choose among association formulas, . 

, We. postulated that our firs& concern was to find & mea- 
Sure which yields an acceptable ranking of associated 
terms. The magnitudes of the numerical values for the 
degree of association are initially of Ao interest.. What 
matters is whether the most closely associated term. 
under formula A is one of the most closely associated 
terms under formula B. If so, the formulas are similar; , 
if not, dissimilar. Generalizing . these: ‘ideas, two formulas 

" which yield the same Hee — the m — 


" 
A 


5 This is true — — there are ties rodas — «the association i 
measure in use. 


of terms in the profile are equivalent from this point of 
view. 

The notion of equivalent rankings is important prin- 
cipally because this is the practical way to tell the 
formulas apart. One prepares profiles using several for- 
mulas and examines them to see which one places the 
most suitable terms 8t or near the head of the list. 
¡Attention to the numerical values assigned is secondary. 
‘Since we are trying to relate the behavior of the formulas 
to the kinds of things a person comparing such profiles 
Iside-by-side would look for, the ordering is the property 
ito examine first, 


* Generating a Spectrum of Association 
Measures 


The objective of comparing the ranking behavior of 
various formulas is well served by finding a useful way 
to place them all into the same mathematical form. One 
way to do this is to develop a general expression that 
generates all the formulas of interest and that reduces 
to any specific one by a choice of parameters in the 
general expression. But a glance at the expressions for 
Aad in Fig. 1 shows that a general expression that would 
include, for instance, formula III as a special case would 
be too complicated to manage. Fortunately, by directing 
‘attention to the approximate ranking produced by a 
formula, it is possible to use a simple, readily under- 
standable model for generating a useful spectrum of 
alternatives. We shall treat the process of forming an 
association list as a stylized method of retrieving certain 
documents. l 

Let term a with frequency fa be the header term for 
which we wish to develop a profilo. Let some other 
term b, with frequency fb, co-occur with a fab times, 
as shown by the matrix in Fig. 2. Let us now think of 
a as defining (as it clearly does) a set of documents: 


Terms 


N 
Documents 





fa fb 


Fig, 2. Document-term matrix showing that term a 
(frequency fa) co-occurs with term b (frequency fb) ex- 
actly fab times 


those documents indexed by a. Let us think of the other 
terms 6 in the vocabulary (candidates for being “asso- 
ciated” with a) as single-term requests, and define the 


- objective of each of these searches to be the retrieval of 


those documents indexed by a. In short, the a-indexed 
documents {and only those) are “relevant.” The 6 in- 
dexed documents are “retrieved.” 

With this conceptual attitude, the famihar Recall 
and Precision measures can be defined for each term b 
(with respect to the given term a). They measure—with 
the usual disclaimers—the goodness of 6 as a substitute 
for a. 

The Recall of term b is the proportion of documents 
indexed by term a which are also indexed by term b: 

ab 


— fab 
Recalls = fa (1) 


The precision of term b is the proportion of docu- 
ments posted to b which are also indexed by tem a: 
Precisions =" (2) 
We now have two measures of b’s capability to be 
used in lieu of a. But we want only one since the degree 
to which b is associated with a is a single number. There- 
fore we wish to combine Recall and Precision into a 
single measure. The product suggests itself since a term 
b with both high Recall and Precision should have a 
high association value. But since we have no idea whether 
to consider Recall more important than Precision or 
vice versa, we multiply them together with adjustable 
exponents. Thus a spectrum of directly interpretable 
measures of the association of term 6 with term a is 
provided by 


fab a-n) fab\" . 
Aub = (S) (a) = 
where n is such that 0<n <1. Varying n generates a 
variety of association measures, each representing a 
different interpretation of the relative importance of 
Recall and Precision in this viewpoint towards association 
Ineasures. | 

Since we ascribe little merit to the actual numerical 
value Aab given by a measure, putting emphasis rather 
on the resulting profile term ranking, we allow ourselves 
to alter a given measure—including this one—so long as 
the ranking remains invariant under the alteration, Two 
alterations of this kind which yield.equivalent rankings 
are important: 


a) Any positive power of a formula which yields 
non-negative association measures produces the same 
ranking of terms as the original formula. 


Proof: Azy > Azz > 0 and k> 0 implies Azy* > Azz* > 0 


b) Also, we will usually be allowed to strike the fac- 
tor fa from the measure, since it is the same constant 
for all the terms in a’s profile and therefore does not 
affect the ranking. 
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Applying these rules £o Equation 3 yields the equivalent 
formulation - 
Aa = (Jab)e= ( 42)" =e (4) 
The model presented above thus yields a spectrum of 
measures of the type Aab =fab/ fb”. Each such measure 18 
“rank-equivalent to" (produces the same ranking as) a 
formula derived from a specified weighting of Recall 
and Precision in this framework. 


* Graphical Interpretation of the Rankings 
Produced 


The next task is to find, for the more complex statis- 
tical formulas, which choice of n produces substantially 
the same ranking of terms, This will allow us, if we 
choose, to interpret those other formulas in a common 
framework. Let us therefore examine more closely the 
way the choice of n affects the ranking and develop 
some of the apparatus for relating n to more complex 
measures. 

Figure 3 shows & graph of the (fb, fab) space which 
is of interest bécause of the form of Equation 4. Each 
term which co-occurs with the header term can be placed 
as a point on this graph according to its frequency (fb) 
and the number of times it co-occurs with the header 
term (fab). (Note that all terms must be located on or 
below 45° hne, since fb = fab.) 






fab 
— of a term with frequency 

7 which co-occurs with term a 
4 times 


7 £b 
Fia. 3. The (jb, fab) space 


Figure 4 indieates what the distributions of fab and fb 

might be* for the terms which co-occur with a given 
header term. An association measure is represented dy- 
namicaly as a curve of stated shape which moves in 
this space, and the ranking of terms on a profile accord- 
ing to that méasure is the order in which this curve 
passes points which represent co-occurring terms. 


Jt would be possible empirically to derive the distribution of 
(fb, fab) by sampling a large portion of the data. 

While we have the raw data, we have not determined the detailed 
distribution. However, it is known by observation that the points 
strongly tend to be distributed in the manner shown in Fig. 4, with 
a very high density of points in the lower left-hand corner, trolling 
off horizontally and upwards, For the present purposes this crude 
description of the distribution is probably aufficient, though more ac- 
curate data would be helpful in future work. 
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Distribution of 
co-occurrences 
for Terma 





Distribution of 
frequencies of terms 
which co-occur with 
Term a. 


Fic. 4. Distribution of points in the (fb, fab) space 


Viewing the measures in this way allows us to per- 
ceive which areas of the space are passed first and 
therefore which terms are likely to be highly ranked. 

In Fig. 5 we have shown the curves and movements 
for various values of n in Equation 4. 

When n=1, we have a straight line rotating clock- 
wise about the origin. This is the representation of the 
measure e 3 


Aab ="? 

whieh is pure Precision in terms of our model. Note 
that the maximum value attainable arises when fab — fb, 
le. term b co-occurs with term a each time it occurs. 
Thus, a term with the values fb—1, fab—1, or equiv- 
alent must be ranked number 1. No distinction is made 
between such a term and one with values fb=10, fab 10, 
although there seems to be more evidence in support 
of saying the terms are associated in this-latter case. 

The graphical representation of the case for n=# is 


fab 


A — 


fb 


Fic. 5. Graphical representation of various association 
measures suggested by the model 


a curve as shown in Fig. 5 again rotating clockwise 
about the origin. The measure for n= $ is 


fab 

jot 

In contrast to the case for n=1, this formula ranks 
a term with values fb —10, fab=10 higher than a term 
with values fb=1, fab=1. This phenomenon is quite 
apparent in the dynamic graph since different areas 
are covered first by the movement of the corresponding 
curves, The bend in the curve for n=% causes it to be 
well above the point fb—1, fab —1 when it érosses the 
point fb —10, fab —10. In general, measures of the type 
fab divided by some root of fb tend to bend more sharply 
as n approaches 0. 

The limiting ease when n does reach 0 is given by & 
horizontal line which moves straight downward with 
decreasing fab. This measure ranks the terms in order 
of co-occurrence count. (This is pure Recall.) 


Aab = 


Not all possible ranking rules fall within the scope: 


of the model, of course. For example, the extreme 
case represented by a vertical line that moves from 
right to left 1s not strictly within the scope of the model, 
since n would have to be —oo. It is of interest as an 
extreme, however, and for this reason is shown in Fig. 4. 
It corresponds to the measure 


Aab = fb 

The dynamic graphical behavior of the formulas 
produced by the model supplies a means for visualizing 
the effects which various choices of n will have on the 
ranking of terms. Other more complex statistical for- 
mulas can also be treated within this same framework. 

We shall show in the next section that the graphical 
behavior of these other formulas often resembles the 
graphical behavior of those generated by the simple model 
with a suitable choice of n. 


* Relationship with Other Measures 


In general, the shape of a curve corresponding to 
some association formula outside our fab/fb" framework 
can be obtained by ‘setting the measure equal to a 
constant, provided the formula is a function of (fab, 
fb). The constant represents a particular threshold 
value; the movement of the curve corresponds to alter- 
ation of the threshold. As the threshold is reduced, 
the curve moves downward and a ranking results. 

The measure given by 





Aab = fab — ie (5) 


where N=collection size, has been suggested by Maron. 


and Kuhns (6). When set to a constant (to represent 
a particular threshold K), and solved for fab, Equation 
5 becomes 


jek d ie? (6) 


The graphical representation of fab as a function of 
fb is thus a straight line with small 5 slope fa/N, emerg- 
ing from the vertical axis at point fab—K. 

As the threshold is reduced, this nearly horizontal 
line moves vertically downward. The ranking it produces 
can be expected to be very similar to a ranking by 
fab alone, except that the slope of the line is fa/N 
rather than zero. Note that we cannot strike fa or N 
from the formula without disturbing the ranking since 
they are not pure additive or multiplicative constants. 

A similar graphical interpretation is given for 


NEED. HOPES 
Aab — n qan (7) 
a measure suggested by Doyle (6). 
When set to K and solved for fab we get 
K K 


Again a linear relationship exists between fb and fab 
as in the previous case, However, the slope of the line 
approaches zero as the line moves from the top of the 
graph (where the slope is 1) to the bottom. (This one 
requires more effort than the rest to see clearly in the 
dynamie graph.) 

In the formula suggested by Dennis (2) and given by 


daba 2. 
N 
Aab = — a (9) 
N 
we may eliminate fa and N as constants. Setting Aab 
to K and solving for fab yields 





jab = KV fo +% jo | 09) 


this is seen to be representable as the sum of the square 
root curve (le. n= and a straight line with slope 
fa/N. Since we would expect fa/N to be small, Equation 
9 should yield a ranking similar to the ease when n— &. 

A measure investigated by Stiles (7) and given by 


NN? 

"ERU dodo =) (1) 

Jajb (N —fa) (N — fo) 
is approximately equivalent to the ranking produced 
by the measure 

(fab — 17 N 
Jajb 
when fax fb is less than N. (See Fossum (8).) Striking 
out N and fa we can see that this measure will approxi- 
mate the case when n=}. The representation of this 
measure then will be a curve very similar to 


jab = K v fb (13) 

The formulas treated above (except for Doyle's) are 
thus convertible, without extraordinary difficulty into 
a form where a value of n in fab/fb” can be assigned as 
a crude descriptive parameter. We do this because of 


(12) 





8 Only a handful of terms a (in the NASA vocabulary, 10 out of 
18,000) bave fa/N in excess of .08. 
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` 


a desire to compare the Tor within the simpler ' 


‘framework provided by the Recall and Precision model 
, discussed | earlier.. The relations wil be clearer after 


studying the illustrative ala in the next section. = | 


a 
t 


e Mustrativo Example | s iran ; 

"The relationships among ` the various — dis- 
cusséd in the preceding section are illustrated in the 
example presented in Fig. 6. For reasons of space and ` 
clarity, ihe length of the association. lists is radically ,. 
‘curtailed. The data are- drawn. from the: NASA col- 
lection ` statistics, a. collection’ we noted’ earlier which ^ 
contains ‘about 100 000 documents and 18,000 index | 
terms. ` ' 


‘header term “Rocket Motor Case” as ranked by each. 
of 9 measures. "The set of. co-occurring terms was first 
restricted to’ include only those terms b such that 29%, ^ 
or more of their occurrences "were co-occurrences with ` 
“Rocket Motor Case,” le, "fab/fb = 102. This. restric- . 
tion cut the set of candidates to" about 3 ‘of the set of : 
all terms that co-occurred with the header term, and - 
Was necessary because of computer. program ‘limitations, ` 
“This selection governs all the lists compared in Fig. 6. 

Five of the rankings: shown: are produced by arranging - 
the term b according to five choices of n in Equation ` 


4: ‘The’ ———— 4 p were produced e measures ' P 


an Fig. I 

Figure- ‘7-shows the position of — curve —— 
‘ach of the measures as it seleetë its 15th term. That ' 
is, in thè positions indicated, each of. the measures has 


“chosen 15 terms ‘associated with ' “Rocket Motor: Case. "e, 


The. thresholds’ corresponding’ to’ these positions ‘vary, ' 
and for a particular measure the threshold’ is ' merely: - 
the value; of. that. measure” for. the’ term: ‘tanked. 15. 
Thus, by setting. the: measure “equal to the threshold ` 


represented by the-15th ranked term, the — of. 


the curve at that point results. 


- The: various term lists shown in Fig. 6 afe — 


systematically according to the average slopes, of the 


corresponding ;curve in Fig. 7, beginning > with the ` 
vertical line (fb) and rotating counter clockwise until ` 


We. Teach | the 45° line. This arrangement corresponds; 
to increasing m from - —oo to +1 for those measures 


derived from the — the other measures are — 


spersed. 


_ Inspection’ af dis lists in Jg. 6 will quickly — 
that they all contain a good proportion of térms which `:, 
most evaluators would judge to be associated with the.. .: 
header term, “Rocket Motor Case.” This is not particu- | 
larly surprising in a’ vocabulary of 18,000- terms.. 


There are probably 150 good associates for each middle 


frequency. term.in a vocabulary. of this size. Even if > ` 
it were practical to print the top-(say) 200 terms here,. 
side-by-side, appraisal of the comparative rankings would ` 


1 
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Figure 6 users thé: 15. db ! — for ihe. 


J “special knowledge to understand: ` 


"od 


want to devote). Fortunately, however, we can charac- 
` terize each list by. describing the types of terms: which .' 
tend to appear at the top of the list. - These charac 
teristics are. features. of the ‘measuré- which ' “generated 
the particular list. Given a specific application for which. 
the list is to be used, we could ‘then hope to assess in,- 
‘advance which measures .would be. expected best to: 
meet the application’ 8 requirements. Essentially; all that: 


^ ig needed' is’ some statement ‘about. which characteris- 


“tics of highly associated lepus are desirable for Me. 
ven application; ` ` 


- The list for N=-00 i.e., EE by fb. “alone, ig 


— surprisingly good, even when we recall the 2% selection ` 


restriction mentioned: above. That is, merely selecting -. 
the: high. frequency terms which. co- occur with the 


. header more: than 2% -of the time—then ranking: them | 


N deéreasing- frequency— yields & list of words that is: 
far from: ‘ridiculous. This indicates,’ in fact, that the 
term co-occurrence phenomenon i is a stronger effect than" 


- one might be predisposed to suspeot. Naturally, this ` 


. list contains terms which tend to be very general, 
-broad, highly used vocabulary, terms (by construction). 
"t has: the noteworthy attribute that there are hod 
any terms 'appearing on this list which one needs 
(The vertical line. 
a Perera this measure was at fo= 637, mu the 15th 
¿term was chosen.) .. 

. The. terms ,on the list m n= 0, 1.6., ranking by. fab, 
are quite. similar-to those on the list for: n= — O0. nae 


| —— more effort than the reader ` would expect ku | 


however, «that-the constituent terms “case” and “motor” . D 


have moved: into prominence on this list. (The hori- 
zontal line which represents this méasure had the M "n 


dion fab — 31° ‘when the 15th'term was chosen. ). 


"E 


The ranking ‘of: the: -top 15 - terms for the: - measure 


ae fafb/N is precisely that.of the previous measure, . 


` fab. Thus tlie factor fafb/N was not. great enough to in- 
. fluence the ranking up to: this. point.“ Note the line at 
-this point. has already turned: cloekwise to & very — 
— the — of the line being | 


. fab = 28 + 00243/b` 


differs significantly from the previous lists. Technical’ 
terms, like’ “Hydro test,” "deep draw," and “closure” 


' begin to be included. -Thüs list and the next two are 
I extremely similar, 
" terms. (The curve for this‘ measure was obtained by. 
"plotting actual points since the’ équation is quite com- `. 
plex.) However, Fig. 7 exhibita the obvious similarity 


‘with only minor permutations of 


of this and the next two" “measures, showing that the. 
approximation in Equation 12'is indeed valid for this . 


E n 
Lo dab 


4 
— I / ja F: 
y . A f 
` à ^ 
ad — ` 
al a t 
a w 4 4 > r 
* * * 
. 
t . + “+ 
, : 


4 . 
1 ` 


^ 7" A! The list given by the measure suggested by Stiles (7). 
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^ header term. The. AQUA toos of the curves for the mea- ` :` 
. sures — D : . — 


bn + . . - fab(n=0) - ` e | E N 





2 Rocket . Case LL APT ca | | 
| eA , Propellant 2707 Metro 0,000 Motor cc. "c 
r | "^ (Sed Rocket ` 7 Rocket SM BT. 
Steel l Ts Rocket Engine i . Rocket Engine : ; NE. 

| UJ - Rocket Engine ` + Solid: "rs ? ^ Solid .- a "TM a 
j | . ` , Fabrications ©- ^ > ` Propellant ^ 7  .' Propellant ` SM 
e 9M E Titanium Ste 20 Steel: TTD | 
| - ' Motor a . Fabrication me M Fabrication — ^" E 
[poe : Glass LN „Winding ' | .. 05 O Winding — T GE. 
š ! ' Grain ^05 Fllament | 220 Filament up. Tay B 
; Insulation ` ;' — ''Selid Prop. Rocket Eng. ^. " Solid Prop. Rocket Eng. 

- | Welding ^: — . ^ ^ Filament. Winding. - * — Filament Winding 
0H ^. Fracture 00 00057 Titanium ' l 007, 0 7 Titanium 

M. Bonding - v Fiber” | EE Fiber .. | 
| "Cryogenics. `. (Glas. O . . - .; Gass ° ^* os 


ed 


QN 


| a r. ex 7 TN MET - fab ME 
STILES . ME" . I ` 1⁄2 (n = 1/2). 
STILES N .. fafb "EN S2 ( ` ) 
N it gas "Wm HE" . o4 











: Case - : P . Case '' : k: l Case E " ECCE ee ; 
| . ` , Motor ` p. ea . Motor. `. TIE . Motor: x B. ax v 
20775 7 o, Winding ^^^ —— `: Winding. .« ^ . — 7. Winding’ 
D | ` Rocket BE. : Filament Winding  . € d : Rocket : mi 
| Filament Winding ' Deep Draw ' > E ` Filament Winding. ` 
| 











° Stretch Foming -` | Rocket’. ^. 9m Deep Draw 
- ^". Deep Draw | 0 Stretch Forming ` .: `: ` + “Stretch Forming : 
MNT. I - Hydrotest . . - ... Hydrotest 2E - Hydrotest: . Í 
i Filament .  :  .,.  Filament | © í Filament 
| . Rocket Engine "^: `, Rocket Engine ` qu 7 Rocket Engine | | š 
‘Closure l Closure : Ber NN ' -Closure B xe l ; 
Fiberglass: '.. . | .Tiberlass* © , ^ '. | Fiberglass ; s 
| _ Stretch ` 5.5 Stretch `` | 5o 05 Stretch o > i à 
š Steel ` 00s 55 Spiral Wrap | °  - l Spiral Wrap | 
B "Fabrication = — ^. Stel | © À | . Steel 
l e , , "e " oo a we ' - E P . . i . a 
xo A I l — -~ fab so 1: ' 
fa E fab . 2 08 eg 23 (n a 2/3) E ^. d fap (n = D 
- Case ` I f . Case: ` i e | Altair Missile 
‘Motor | ' - +" Deep Draw | 4 | Polaris A2A Missile 
Winding — | ` Motor . > ia eee © Seepage. f 
Filament Winding Spiral Wap. ° 0 Spiral Wrap | ` 
. Filament ` Stretch Forming i ° “¿Stretch Project —— 
"EM Closure jaa i . Hydrotest co 007 _ TU 290 Motor . ' - 
: Fiberglass 3 ' Seepage — + Turks Head Mill ` ` « 
` Glass Fiber P Winding . ` a ~ Deep Draw. ae 
Stretch Forming -Filament Winding `- Dy Environmenta! Temp. 
Stretch PM Altair Missile : ` ^, Fuzz "M ue ug 
‘Solid Prop. Rocket Eng. Stretch Project — ^" ' ' Helical Winding | 
Rocket Engine. ` Closure ^. Pie - Stretch Forming 
Reinforced Plastic - TU 290 Motor a a . Hydrotest E 
High Strength — "Turks Head Mill : n | Vasco Jet | 
Fabrication - Polaris ALA Missile. i Wing iV Motor 
Fra. 6.. The top 15 associates of the term “Rocket Motor pcd as — by — of nine association measures 
! 
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1000 : i 1900 - 
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.. 500 ~. -. 


| are | fab= 158 VITO 00243/b- and fab= 178 Vi 
— 


The ranking: by fab/fà-i ru fab’ CORDE many of the 
ns seen terms, plus. some specific additions such ^ 
The curve 


, 88 “high strength” and “reinforced plastic.” 
in representing mu measure is given at this point by. 


T" ME . fab = 041 fb + 10 


“Next, e. list for n=3, ie., fab/fb%, shows. the ion 
| of some very ‘specific, highly technical terms. such as. 
* ~ “seepage,” “Polaris, A2A missile” “and “Turks head mill.” .' 

"The equation of the curve when it has chosen the ibth 


term is | ME JI 


y 


' ` Jab = = ofi 

See ` Finally, the list giveh by fab/ fb ha — is ied 
‘Almost .-all 15 terms: on the. list .are low frequency, 

| zd Spese terms. The equation of the line: is 

uu E fab = 460 TN | 
. at this point: E SEE C 

* The discussion. above, in — with an inspec- 
ton of thé curves representing. various measures, shows 


that: the lists corresponding | to various n tend to: become. . 


-more ‘specific and. technical an | nature as: ^. goes from 
7:0 to +L. B 


. “Conclusion — - oe Ñ ut a - . 
“The: — af associated terms ‘produced bs using ' 
pP partieulár association: formula Is. capable ` -of ` being 
"evaluated subjectively (though: crudely) for: its’ value ; 


in ia particular application: - Á rom ‘parametrization. 
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f 


| 9t the, range ‘of possibilities to consider i is presented, id 
the parameter n.is suggested as-a summary statistic. It 


E m to covary with the kinds of subjective charac- | 


teristics of the- lists that bear upon choosing a. formula 
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3500 - I . Bc 


ni 
t 


for :a particúlar application. In particular as m moves | 


toward. Y the lists become ‘increasingly pran xe i 
` gpecialized. 


It is important to repeat — we aacribe no. great o 


i : theoretical importance to the parametrization .in terms ' 
| of. n. We regard it.as a convenient way. to encapsulate . 
. & host of crude but interesting empirical, observations. ' 


' We recognize that there are workers who ‘care about, 


thé present development, principally as: a. way. to nar? 
row. the field to isolate the' principal, behavioral fea- 


-' tures of a formula with useful properties. ‘The exact 


choice, once this rough one has been made, of the par- 


` ticular formula best suited -to: the application i is a matter 


| of profiles. br 


Given the framework developed here, we now “turn. 


. in conclusion to.summarizing our own experience; Shee 
~ vations, and impressions 28 ur relate to the een 


^ 
, 


L As a e one can nd: — and “applications” 


- for which each of the lists generated (e.g. in Fig. 6) 
- ,.1s evaluated. as “best.” The choice depends both on 


characteristics of the person using the list and on the, 


> nature of the intended application. ` ' 


- 


pf 
^ 


2. .Peóple—even technical, specialists ` tend. to — 


E the lists where” n approaches i because. of. the prominence * 


H `< FR od 
` š 3 


i- 


that those closést to the situation § are best ras to^ Me 
study. x Š sa 


” > 


tt 


` 
t 


^w 


n 


* 
t 


. fa-fb-fab. EL Rs i 


` ` 
` aA 


- and situations that call for, very detailed, attention to. . ^ 
. the exact value of the association coefficient. We regard“. | 
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given to extraordinarily — | speci ud ys le! 


ous terms; People tend, to: find unfamiliar terms like 
“Turks Head Mill” discóncerting. "However, for, fully . 
automatic. associative , retrieval" (when a request. js 'ex- ` 
panded into a profile, and that profile—without being | 
inspécted—is used as the weighted search prescription) 


ormance.: J 

3. For the purpose ` of preparing š Di listing 
for use by subject, matter specialists either. during query 
‘formulation or. during an interactive retrieval. p 
„the formulas-in the: vicinity of n=% appear to be most 
satisfactory. . As a rule, the more special knowledge of 
the field which one ean count on the user possessing, - 
the more we can move toward . formulas’ with- higher 
values of n. The range $= <n<=3 seems most e 
2 applications-of this type. 


4. We have had only limited experience with: the - 
| effort to use an association profile in preparing ‘a printed 


Thesaurus for general use. Preliminary indications are 


that measures in „the low: ind of A. (051) tend 


to be preferred. . 


To make a very broad summary statement, there is 


a tendency-for n to vary more or less in oao dan ë with ` 


“how. far along in the retrieval, process one 18.” Dy: this 


iwe mean: that a requestor, entering the retrieval situa- ` 
tion for the'first time, who is not, too sure ‘of what he 


‘is looking for or how to express it, is likely to. have 


' & preference for lists. with low values of n. Ashe moves ` 
along through the search , or gains experience "with the. 


collection, the vocabulary, ‘the field, ete.,-higher , values 
of n become appropriate. 
velopments, of course; are considerations of whether the 
requestor is. expanding or narrowing his. search at “the 


time he is inspecting a particular association. list. Higher f 


values, of n tend to bé most appropriate for narrowing.) 


Finally, - as the requestor nears the point where he is. 
ready to look at documents, & fairly. — value of n 


seems be most appropriate. 


| 3. GOODMAN, L., and W. ERUSKAL, Measures of Association 
. for Cross-Classifications, Journal of American Statts- | 


‘(Concurrent with these de- 


ot 


- 


Mi the paraineter dus the course fs an in- 
PRAE CUN search with the. requestor’ on-line may be of 
value if, this apparent trend is substantiated ‘and if the 


E. practicalities of the'system permit. `- 


‘But for the present we suggest ‘this view. mainly as 


(778 bridge by which the experience, intuition, and judg- 
le formulas with n near 1 appear to- give the best per-: 


ment’ of: those ‘closely involved’ in a particular retrieval 
application can be related to the choice of an association 
measure. ^ c — | 
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In an — paper IT we €— a method of e | area containing the desired point and immediately sup-`:: 
allocating computer costs in,& mechanized ‘storage and ' plies the results. Matches for projects containing two or- 
D retrieval activity. Also included was actual operating cost . more’ points are accomplished simply by, finding which * 


expérience. for the Science: Information. Exchange (SIE)  . "identical: identification numbers appear under each. T 
from Wie period January 1964 to June 1965. System “` . The necessary computer programming for direct access: ,- 
modifications and equipment ch&nges since then have con- ` > disc searching was completed late in the. summer of 1966: | 
‘tinued. to reduce costs further as shown in. Table 1. - + "September 1966 was the first full month in which the ` 


e „Ìn May 1965, an IBM 1460 central processing unit .. inverted subjéct search system was used. for many tasks: ' 
5 , feplacéd the Exchange's IBM 1401. In April.1966, SIE. that would have been ‘batched and run. against the « 
:, replaced its IBM 1460 array with an IBM 360/30 and ' - magnetic tape master file. The cost reduction per job 
- added. diréct. -access disc capability to the tape oriented . ` "has been in accordance with expectations; and costs are. 
` system.. ¿The master file was retained on magnetic tape - ' expected to go even lower as further refinements - are 


$ o 


_ And from it an inverted subject: file was generated on . made to the operating system. ` ` i s 
discs enabling . direct ACCESS . ‘searching. The. inverted * I Area 1 of Table 1 shows ' a summary of. operating , 
disc file contains a list of all. the subject index points ... experience présented in our previous paper. The batched .. 
“used at SIE. Appended to-each point are all the identi- * : jobs—(ie., subject or bibliographic searches. which for 


-fication numbers of projects which have been indexed with ` economy considerations were batched and. run' "against . 
` that particular point. This. 1 is illustrated.in Fig. 1. The a single pass of a master tape file), declined from $37 | 
` search for any subject. ird goes directly to the dise `. per job i in early 1964 to about $30 per jób in mid-1965. 
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Table 1. Allocation of computer costs. | utl ue d 
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E mE | | Direct computer costs = XE "Total computer costs — Tu 
E Ve E +. Computer use | ` Batched jobs ` Singly run jobs — E — v 
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When a prorated shate of the maintenance expense is 


added, however, the costs: range from — to per 


seareh over the same time periods. 


Area 2 shows the resülts of. some — — 
ments plus the: faster cycle time ‘of,.the 1460 central d 
‘processing unit. The - direct cost. per. search was re“ 


(duced to about $22: or. about $32 per. search with the 
maintenance burden added. The costs per tape search 


“did not decrease further ` upon installation of the 360. 


i until dise searches wereinitiated. : ^ ' - 

! Area 3 shows costs for the first month when most. of 
' the searches were performed using the disc files. Master 
i i f $ paN . Sb py esten Dieta tss ae f 
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tape: ie ‘stil had to — used in certain — CASES. 
- The cost, per subject search was further reduced to $11. 


per job‘ for direct cost and s. par. job with main 
tenance burden added. i 

Figure 2 shows the cost data in ——— form. It is 
interesting to note that the cost which includes the 


maintenance burden is not as dramatically reduced as 


18 ‘the: direct computer cost. This is partially because 
it' requires: ‘additional: maintenance hours to update the ' 


- inverted files. 


We expect the- — per bibliographie — to de- 


———— further nS z E greater De of - 
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Fic. 2. Cost for computer searches 


bibliographie searches will be accomplished via the disc 


Reference 


files, (2) additional improvements are already being 


Analysis in Computer Storage and Retrieval, American 


1. Marron, H., and M. SN YDERMAN, Cost Distribution and . 
Documentation, 17 (No. 2): 89-95 (1966). 


made to the operating systems, and (3) computer usage 
will increase thus decreasing the cost per hour and the 


prorated burden each task must carry for the main- 


tenance expense. 
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Relevance Disagreements and Unclear Request Forms" 


| 


Disagreements about the relevance of documents to 
retrieval requests occur because relevance judges dif- 
ferently interpret requests or documents. Requests may 
be differently interpreted because they are unclear. 
Well-known types of request obscurity are reviewed. 
Less well known is that a request may be unclear be- 
cause its form—''documents about subject S," “docu- 
ments answering question Q,” etc.—is unclear. 


'Explications are developed of the meanings of the - 


request forms just given and several others. A request 
of any of the forms discussed is interpreted to be for 
documents which support statements of a specified kind 
in a specified way. For example, an “about S" request 


e 1. Background 


INTRODUCTION 


Document retrieval is the selection of documents of a 
specified kind from a large collection. A basically impor- 
tant way of specifying the kind of document desired is by 
subject. For example, “All theoretical papers on nuclear 
models,” “What phosphorus compounds produce neuro- 
toxic effects?" "Inconel and Monel: Composition, prop- 
erties, weldability, and metallurgy of Inconel-X,. Inco-À, 
Inco-140, and Monel,” “What procedure can be used for 
preparing impurity-free ferric ethylate?” 1 

Subject document retrieval for scientists and engineers 
has received increasing attention in the last two decades. 
Traditional methods have been analysed further and 
many new methods have been introduced, some using 
computers. But it is not clear which methods are most 


* This research was sponsored by the Information Systems Branch, 
Office of Naval Research (Contract Nonr N00014—80—00098). Reproduc- 
tion in whole or in part is permitted for any purpose of the United 
State Government. 

f Present address: Center for the Information Sciences, Lehigh Uni- 
versity. 

1 These examples are taken from four different lista of retrieval re- 
quests which are described in a table at the end of Part 2. The first 
example is from the AIP liat and the others from (1, 8, 3), respectively. 


requires documents supporting statements which con- 
tain expression S (though several qualifications are 
needed); a "question Q” request requires documents 
which support answers to Q. Examples are given which 
suggest that some, perhaps all, ‘‘about S” requests are 
unclear. Some ways of formulating clear question re- 
quests are given. i 

Various ways in which documents may support stałe- 
ments are distinguished. These depend on such factors 
as parts of a document used, inference strength, and 
background knowledge permitted. Some possibly clear 
support specifications are indicated. 


JOHN: O'CONNOR f 


Institute for Advancement of Medical Communication 
Philadelphia, Pennsylvania 


effective, or whether any are effective enough. Therefore 
in recent years there has been a growing amount of work 
on testing retrieval methods and systems. 


RELEVANCE DISAGREEMENTS 


Documents which satisfy a subject retrieval request 
are usually called “relevant” to the request. - Thus the 
basic function of a subject document retrieval system is 
to provide relevant documents for a request. However in 
practice there is sometimes disagreement among people 
competent to judge about whether a given document is 
relevant to a specified request. For instance, in a re- 
trieval test, two different systems were used to retrieve 
from the same collection for the same ninety-eight re- 
quests. In comparing the retrieval results, the operators 
of the two systems agreed that 1,390 documents were 
relevant. But one group claimed relevance for an addi- 
tional 488 documents and the other group claimed rele- 
vande for a different 1,089 documents. These claims were 
made after each group had examined all documents re- 
trieved by each system (4). As another example, accord- 
ing to a survey of users of the American Society of Metals 
—-Western Reserve University Metallurgical Searching 
Service, only about half the abstracts sent in response to 
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— 


EF were. erem relevant By the users (51 p. .90). 


' even though the output had been relevance checked’ by . 
Service personnel? “This is comparable with results of `. 


, "other systems" (2, p. 28). On the other hand; there was 
- far more agreement amoñg relévance judges i in a retrieval 
; ‘experiment with physics, literature. (6). Fewer than 20 
. percent of the relevance, judgments’ independently made 


by-two physicista variéd.by more than: tio ‘points on a. 
- scale of 10. There was then, “resolution of all variances" 


through | discussion with .a third. physicist;. Moreover, 


“most of this variance was attributable’to obvious — 
. sigh A (6, pp. '1100, Tus E Coen 2. 4 ih 


"^ 


i i Rasa a: Disaorpéncenre AND cran Rravsere | 


The results described i in the first ‘example of the preced- 


ing paragraph, were interpreted by one of. the groups . 
involved, ARC (Armed Services Technical Information E 


Agency | Refetence Center) in the following way: 


ARC took the ‘position that since it ig difficult to be 


' gure from à semantic viewpoint what a requestor wants, 
the Center’ prefers to ‘send the maximum number o 


. references to a requestor rather than take the chance of -. ` 
. . failing to include 4 reference which might bë useful to , 
him. Om the other hand ARC pointed out that it-ap- `., 


peared that Documentation Inc. acted, upon the as- 
' sumption that: 
well defined area of pertinent information and that the 
.boundary between a high plateau of pertinency and. a 
surrounding lowland of irrelevant references ‘can be 
located with — — accuracy from the Poring of 
the request (4,-p. 236). 


This fails to explain why. ARC voii not fre ag is reel | 
-. vant 488 documents judged ‘relevant by Documentation 
However -the passage is noteworthy for suggesting, 


Ine: 


there exists for each question. telatively | 


» Krwpé. OF Usu Ragua. 


"ys 


—* 


= E a: ——— affect the. Coulomb. siniteridg. ‘of 


charged particles by that nucleus? (ii) What are ‘the: 
. “magic numbers”: for nuclear shell structure?: (6, p. 
1100.).. Should ambiguities of viewpoint arise it [was] . 
. postulated ‘that the -questions occur in the context of a.“ 
' college. examination and that. — be “based ` 
' Da) a PPM under those conditions e I 
p. 9). | 


` ` * ` A 
` ... 
WS. `y Le ` 


À aib of different ways in “which a edel re- 
quest can be unclear are well known in documentation. | 
They are ‘summarily described below. :' e x 

A request may contain an expression which is am- ` 
biguous in the situation. For- example, a request “Gon- 


„< cerned insecticide in control of flies,”.and the: ‘requester . 


"mS 


judged ‘some documents irrelevant because they: were” 


.. about black flies, etc., ànd by “flies” he meant houseflies . 
. (10, p. 138). 
` mance of” engines with liquid injection, > was searched by 
. four people as'part.of s retrieval experiment.’ Three’ of. ' 
- the searchers understood “liquid injection” to méan “Fuel 


As another example, the request, - Perfor- . 


injection; ” but the fourth interpreted it: to meán "water ` 


injection," which ‘was’ also the’ intent of the requestor 


(11,p. 180). E 
A request may. be . obscure. booking itis syntactivally 


ambiguous. For example, would the :request “Inconel 


. . and Monel: Composition, properties, weldability, and. | 


I metallurgy of Inconel-X, Inco-A, Incó-140, and Monel”. 

. be satisfied by a paper on the wéldability z Monel which | 
. .- sald nothing otherwise about Monel , and nothing about. 
_ the other alloys named in the request? In other words, | 


".. what: do the and's in'the request mean? It seems to be: 


_ that relevance judges disagree because retrieval requests ` 


are unclear. A somewhat. more explicit: ‘formulation of. 
‘this idea is Mooers' assertion that, “Any: inquiry , from a . 
customer can bé interpreted i in various ways. Depending | 
— upon: the interpretation, the relevance of the documents ` 


_ produced will change” (7, p. 4), 


Some evidence that relevance disagreements are caused, 


by unclear retrieval requests is provided by the physics 
retrieval experiment (6) in which there was. great, agree- 


ment among. relevance judges. A general assumption of .. 


that study was that 


. the requester. ‘could - in: principle - communicate E 


` requirement with reasonable accuracy to some other 
person, especially 
subject matter (8, pp: 281-282). 


Tn Doug the e questions devised for the experiment n 


` ' were: | 
T highly. — and, to the — possible, incor- 
' porate(d ges a themselves the requester’s “view- | 


- point” and “motive.” Examples of such uestions are: , .. 
r^ : * — there may be uncertainties related to what the requester 


' knows and how he will evaluate papers. He may want : 
, only documents containing information new to him (per- 


(1) What nuclear reactions are sensitive to the spin. and 
parity of mesons and .hence are useful in measuring 


| ^ those Guantes? (ü) How does — polarization. 


" * This figure Includes ‘an unknown percentage ot: abstracta judged, by Ñ 
A8). 


users to be relent but not “useful” (8, p. 


E H 


166 | 


tó someone. knowledgeable in the | 


. ome 18624 a a 


La 


;, ambiguity.: For example, ‘consider the request, 


usual practice to understand, the and'g in a request like - 
this as and/or’s. To the’ extent that is not. commonly” 


: : understood,’ this request ‘and others of similar on are 
P syntactically unclear. 


A request may be unclear ever though it” contains’ no 
a gubject-matter expressions; and no syntactic | 
«AT 
theoretical papers ón nuclear models.” Would a paper. in 

a” journal of pure mathematics which solved a mathe- 


a ea problem. involved in & nuclear-model without in⸗ 


-he may want thorough retrieval. for a comprehensive , 


` tending that physical application be. relevant to.'the.'. 


question? To take another example, for ‘the. request, 
“weldability. of Monel, " would. à ‘paper be relevant. 


which’ described an efficient method: of détermining weld-: 
Y ability for & class of ‘alloys, including ‘Monel, if the paper : 
. did not. mention Monel? In general, is a ‘request Tor 
. papers about subject. S satisfied by papers ‘which, roughly 


speaking, - only indirectly say something , about 8 ?. If so, 
how indirect may thé relation be? This’ kind of request, ? 


E obscurity might be called “vagueness of scope.” / 


Aside from -these problems in interpreting ae 


haps including what hé once knew but has forgotten), or. 


r 


y 


` 
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: review. He misi misi adis papers intelligible to. — or T ü 


' he may accept others as well because’ colléagues can help 


interpret them. Perhaps he desires only conclusive papers `- 


on the subject, or, alternatively he is, for instance, a drug 


administrator concerned also with reports of possible side `, 
| effects. He may wish only “significant” papers, or he may - 
'ant everything he does not already know on the re- - 
. quested subject, depending perhaps on. his general’ atti- . 
` tude toward use of the literature. Finally if, for instance,- 
e asks for methods of. preparing impurity-free ferric | 
ethylate, he may be interested in any preparation meth- E 


ods, or he may’ want only: methods using equipment he 


ssesses; in general he may want ány information on the ` 


requested subject or he:mày want only Iiopnston use- 
ful i in his particular circumstances. — 


a Even if a requester indicates clearly how. new, iiie 


- higible, conclusive, significant, and useful he wants infor- 
‚mation to be, it may still be uncertain what will be new, 


‘intelligible, or useful to him, because not enough i 18 known J 
¡of his background and circumstances. It may also: be ` 
uncertain what hé will-judge conclusive, significant, Or. `’ 
' useful in cases where there can be competent par 


disagreement i in such judging. For.example: 


At `a meeting, following presentation of & paperi | 
ques- 


< Reinmuth (University: of Miami) raise 

tions. concerning the validity of the.scoring method; the 
experience of the individuals performing the various 
` examinations, and the meaning of the increased jugular 


‘ oxygen tensions which have been thought to show. In- | 


. creased utilization of-oxygen (13). 


', when, in 1879-1884, Georg Cantor lied 
bis fundamental results on [set] theory (now one of the 


‘bases of contemporary science), one of them looked so. 
"i paradoxical and upset so radically all our. fundamental | 


notions that it unleashed the decided hostility. of 
Kronecker, one of the-leading mathematicians in that 
time, who prevented Cantor- from getting any: new 
' . appointment in German universities and even- from 


having any memoir published in «German periodicals. d. 


Of course the proof of that result is as clear and rigor- 
' ous as any other proof in mathematics, leaving no pos- 
| sibility of not admitting it (12, p. 92n). 


[From an exchange of letters on: an. earlier paper] ^ 


their conclusion that acetazolamide is a potentially 


wein tool in teratology must bé — by tbe fol-. 


." lowing considerations . . . we find it difficult to temper 


our conclusion that acetazolamide: is a potentially use: . 


ful tool. . .(14). UM dogs 
Obscurities of the kinds — in ‘the preceding two 


paragraphs will be called uncertainties of “user back- ` 


ground.” The name is only roughly appropriate for un- 


certainties concerning what a user will judge conclusive, . 
significant, or useful; but that should cause no problem. ` 


-There is another way in which a retrieval request may 


be unclear. A request is.somiétimes interpreted as repre- | Ni 


senting” an interest only partly conveyed by the request’s 
explicit formulation. For instance, á requester who asks, 


"How can Lissajous figures be generated by digital com- ` 
puter?” may be interested in a description of how they `. 
can be generated moré exactly at comparable. cost by " 
analog computer (15). Or a user who asks, “What pro- . 


— 


` need” (17, p. 81). 


“4, 
or e 
^s 


oa a dm ion o 
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I cedure | can de — for pteparing —— ferric 


ethylate?" may" ‘be interested in.a document naming a: `: 


| “commercial chemical ' supplier of pure ferric ethylate.® 
'-' "In such cases, the document, rather clearly does not satisfy 
: the explicit request, though it may be.of interest to the 

. requester,’ thus the request obscurities of various- types 
i described earlier do not-apply. However if an explicit 


request is “read between’ the lines" by several relevance 


' judges other-than the requester in an. attempt to satisfy 
‘such an interest, they may disagree in interpretation of it. 
Or they: may disagree in. interpretation of. the request be- 


cause only one of them assigns an “implicit meaning" io' 


` it. Further, a requester may disagree with another rele- 
: vance judge in‘ regard. to the ' ‘implicit meaning” of his 
. request because he associates some implicit meaning with 


the request and the other judge does not, or vice versa, 
or the. implicit meaning he associates with the request ia 


. ‘different from that assigned by the other judge. Uncer- 


tainties of this kind. in interpreting requests will be called 
obscurities of. "implicit meaning." 
There may be other general kinds of request obscurity _ 


| i besides those described above. An important one,.obscur- 
ity of request form. meaning, is the central concern of 
— ee Parts 2 and 3). | 


- 
1 
r 


ts Ormai POSSIBLE CAUSES OF RELBVANCE DISAGREEMENTS 


| Belóvànce judios may agree on the meaning of a re- 


' quest, but disagree about whether a particular document 
jo BUDE the kind specified by the request. There appear to 
_ be, no instances of such disagreements described in the 


documentation literature. However, a relevance dis- 
agreement about a document might occur because the 
document is somewhat unclear, and is interpreted in 


' different: ways: by different judges. Or there might.be & 
id relevance disagreement because the judges have different 
E scientific intuitions about thé paper. For instance, if a 
3 request is understood to ask for documents which are con- _ 


clusive, significant, or useful for purpose P, rather than 


- for documenta which will be judged by the requester to be 


conclusive, significant; or' useful for purpose P (as was 


assumed earlier in discussing user background uncertain- 
` ties), then a disagreement about whether a, particular 
- document is, for, example, conclusive is a. disagreement 
_ about the document rather than about the request. 


Some disagreements about relevance are the result of = 


.. careless error in interpreting requests or documents. For. 
- example, ina case described previously, according. to 


ARC’s analysis of'its retrieval failures (492 papers missed 
by ARC, retrieved by Documentation Inc, and agreed 


' by, ARC to be relevant), twenty-five papers. weré missed 
. “because the original interpretation of the request was 
aT inadequate” (4, p. 329). In. the physics retrieval experi- 


ment described earlier, “most” of the initial relevance 


E 3 The — illustrated ‘by these — ve sometimes been 


described by saying that requesters “do not know what they want” (16, 
pp. 15-16), or saying that 2 requester PN be unaware of his real 


- 
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disagreements buen resolved by — Were.: ee whether: thè dee à are. — by am aio dia 
“attributable to obvious oversight" (6, p. 1104)... c " agreements in intuition, “cdreléssness, ignorance, or other 
«Lack of specialized knowledge: on the part of one or... factors. How such a study, might be conducted i is. sketched - 
more relevance judges; ‘including’ the. requester if he.is ~ in the next paragraph. — FEN 15 
tacos ‘outside -his specialties, may lead to differing Suppose | relevance judges Jy. and de TNT the ` 
interpretations of requests or documents, and thus. result - sas of document .D to request F. There.should: then . 
"am relevance ' dace! Several éxamples vill be `. bea “discussion among the. judges and & mediator. . The- 
| given: later* ^ > ae | ç. who asserts relevance (say Jj should be asked to ` 
l 5 ' Tt has been suggested; in this — — T .paraphrase R (call the paraphrase R,), and súmmarize 
‘that relevance disagreements may occur because” "judges. the ‘characteristics of D which match R, (call the sum- 
“interpret requests .or documents differently, because: of * mary $,). He should be asked. to formulaté R,-and. Si 80 . 
.' obscurity, disagreements in scientific” intuition, careless- - J that the: mátch between them is üriquestionable;. for. 
. Dess, or ignorance. However, some studies of ‘Televance ` , instance R=“ any paper aboút A” and S,="D is ‘a 
, diságreements have given quite different lists of. possible ` . paperiabout A” The judge. who denies relevance: U) 
causes. For example, Rees and Saracevic: (18) suggest i should then Be asked what part of J, 'g argument for the . 
that relevance judgments. are affected by such factors a8 '. "relevance of. D to R he does not accept and, why he does: 
“the education. and experience. of the judge, his work func-: . hot. If he questions the paraphrase of R as R,,.then the ` 
. tions (eg. teaching, research, administration), the: pur-, ¿` origirial request R-was unclear in some way, he or J; has" 
. pose ‘he-understands the request.to havé (eg., solving ü made a.careless or ignorant, error, or they have. inter” 


specific problem; compiling a bibliography); the environ-: c preted R differently for some other reagon. - Further.. 
: ment (e.g. university, industry), the timing he under» `` - discussion. should. help. to clarify which of these is the ç 
. stands’ the’ request to have (e.g. different stages in i. ` ease. If J; ‘accepts R; but questions: Se then the dié- < : 
research project), and the: nature of the document: repre- . agreement is about S ‘document, ` Further discussion ' 
'. gentation (eg. full paper, abstract). However, the two. _ should help’ to indicate ¡ts specific nature. The general. ` 
¿“kinds of causes are related. "For.& judge may. intérpret a. -structure of. the “further discussions” which "would follow ; 
; request or a document in '& certain way because of such. an "initial disagreements about R, or $, need, more study `: 
p factors'as his education, work functions, and envirohment, . ' beforé this method of investigating ` relevance disagree- 


ihe document representation he ‘18 given, and what he J menis is tried. Consideration will also need ‘to. be given. ` ` 
;. understands’to be the purpose, environment; and timing. ..' to minimizing construction òf. ex post facto arguments — 
. of the request. A similar remark applies to other lists '.'by judges. to support judgments, made ` on ‘different, 
“of possible causes of relevance disagreements which have ` grounds: or no o grounds., 
- been given. in the literature, insofar as ‘such’ lists do ‘not joe E | vr ar M S 
' .specify differences in request and document BO AE | PIX mr 
pe h s. " | n NE. "oM p Réquesi Form Meanings NM PE "LEON 


Tn Basto Causas, OF > RurpyaNce Diskómemnewms | scum Request F ORMS "oss ac T. NN E Hu 
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^ The basis causes. of ——— osa nia dit. E — Part L it was suggested that — mo re-". 
pom in interpretation of “requests or 'documenta,- `. quests may cause relevance disagreements. This raises the? 
“rather than such factors’ as the education, ete:, of the  .' question of how retrieval requests can. be expressed, | 

-judges, and what they take to be the purpose, environ- , ' clearly. It might seem that a ‘request. with none of the. 
ment, and timing of the request, For if two judges agree ^" obscurities described: in. Part 1 would be clear. However, ; 
"on the kind. of documents a request asks for, and agree on ^! this is not so.'«For example, even if the request, '' “papers. " 
whether or-not a particular document is of that kind, then , on nuclear models, " has none of those’ obscurities, rele- . 
` "their. relevance jüdgmenta necessarily agree. Ori the other’ >" vance judges. may still disagree about: what constitutes ' ña 
'. hand, if the judges are similar in education, ete., and agree - being “a paper on" nuclear models. Similarly, even if the . 
in their understanding of the purpose, environment; and ^, request, “What: phosphorus compounds: produce neuro- ; 

timing. of thé request, it is still at least abstractly possible, toxic’ effects 118 otherwise clear, rélevante judges may. 


E 


o for their relevance judgments to disagree. O ds "TN di&agree about what constitutés being i a document satisfy» 

f ~ Rees et al: (18) and Cuadra et al. (19) are empirically , ring a question. request. In general, the meanings of Te- ` 

| sos how váriations in relevance judges" eduta- ‘quest forms’ appear to need clarification. Parts. 2 and 3 ` 
tion, ete., and understanding of'a request’s purposé, ete, « of this paper ‘describe the results so- far’ of a study at- 
‘are per with disagreements in relevancé iuis | . tempting Buch clarification: "There seem to be no , pies B 


TE ments: Anóther. worthwhile: empirical invéstigation ^ ` vious a on studies a 
PC would be a study of whether particular relevance dis- - - 

^ -agreements are caused by different interpretations of y Requests . FOR Hoemanrs ABOUT, A Sayin, 
—— or different interpretations of: documents, pag 


t 
ta i 


; JA traditional form of request i is for dootiments. “about,” ` 
«Part 8, Baokgrowa ——— second paragraph, — d . y Of “on”. l subject, for example, “All theoretical 


Im 
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question if it is a permitted substitution-instance of the 
question's statement-form.** 

A general restriction on the substitutions in statement- 
forms of questions is that the results must be in well- 
formed English (or other discourse language). Therefore 
care must be taken in writing a statement-form that its 
structure not exclude as ill-formed some substitutions per- 
mitted by the original question. For example, if ^What 
phosphorus compounds produce neurotoxic effects?" will 
accept as answers not only chemical names but also long 
descriptions of natural products, then “X is a phosphorus 
compound which produces neurotoxic effects” would not 
be a correct statement-form for the question. In general, 
when a question permits long descriptions to replace a 


variable, it may be adequate to formulate the question’s - 


statement-form with the variable at the end as a blank, 
allowing substitution of indefinitely long passages. As an 
example, the question above can be represented by the 
statement-form, “The following phosphorus dus as 
produces neurotoxic effects: ... .” 

A question which seems clear may not be completely 
so, in the sense that it is uncertain which statements are 
answers to it. For instance, is the following statement 
an answer to the question, “What is the fastest add-time 
on current computers?”—“The fastest add-time on cur- 
rent computers is less than a microsecond and more than 
one-tenth of a microsecond”? As another example, is the 
following statement an answer to the question, “What 
adult monkeys are docile enough for laboratory use?"— 
“Adult stump-tailed macaque monkeys which are handled 
regularly are docile enough for laboratory use” (22)? If 
an attempt is made to represent either of these questions 
as a statement-form plus substitution conditions, the un- 
certainty of what constitutes an answer appears as an 
uncertainty about what the substitution restrictions are. 

Suppose one tries to formulate completely clear ques- 
tions by expressing them directly as statement-forms plus 
substitution conditions, rather than using natural lan- 
guage question formulations. The substitution conditions 
can be specified in various ways. One might permit any 
substitution-instance of the statement-form which is a 
well-formed statement of English (or other discourse 
language). A question so formulated 3s clear to the ex- 
tent that “well-formed statement of English,” for in- 
stance, is clear. More restrictive substitution conditions 
can be clearly formulated by explicitly listing which ex- 
pressions may be substituted in the statement-form, or 
by naming an existing list of them (for instance, any 
drug listed in a particular pharmaceutical handbook). A 
different kind of substitution condition specifies the type 
of expression which may be substituted, without giving 
or naming an existing list. Some examples are: any 
natural number name, any chemical name, any monkey 
species name, any natural number name followed by 


1 This explication of questions is suggested in (£1, pp. 882-884). 
However, the authors treat yee-no questions by using a variable re- 
stricted to "true" and "false," rather than in the way illustrated by 
the bevatron example. 


“nanosecond” perhaps followed by “plus-or-minus” and 
the name of & natural number no more than 100. Some 
further examples are: any substance description, any 
description of & chemieal laboratory procedure, any 
monkey species name accompanied by a description of a 
laboratory treatment of monkeys. 

The examples in the last sentence of the preceding 
paragraph indicate that a substitution condition which 
specifies à permitted type of substituted expression may 
not be completely clear. For example, there are chemical 
procedures, close in magnitude to industrial processes, 
which competent judges might disagree about calling 
“laboratory procedures”; “laboratory treatment of 
monkeys" is similarly vague. As an example of a dif- 
ferent kind, a paper reporting discovery of a new nerve 
fiber, say the Volk-Smyth fiber, would support the state- 
ment, “A phosphorus compound which destroys the 
Volk-Smyth fiber is a phosphorus compound which pro- 
duces neurotoxic effects." Similarly, a paper reporting a 
new animal tranquilizer, say Calmatine, would support 
the statement, “Adult monkeys treated with Calmatine 
are docile enough for laboratory use.” But competent 
judges might disagree about whether these statements 


` gre answers which satisfy the respective substitution con- 


ditions, "any substance description” and “any description | 
of a laboratory treatment of monkeys.” In general, tech- 
niques for clearly formulating substitution conditions 
need further investigation. 


SOME EXTENSIONS OF QUESTION AND ABOUT REQUESTS 


A retrieval request which is simply a subject expression, 
for example, “Bent crystal spectrometers” (28, p. 47), is 
often interpreted as a request for papers about that sub- 
ject, for instance, papers which say something about bent 
crystal spectrometers. However, in some cases a subject 
expression used as a request may be intended as a. ques- 
tion request. For instance, “Precise measurements of Q- 
value via mass spectrometry and nuclear reactions both” 
(25, p. 47) may be a request for papers answering the 
question, ^What values have been produced by precise 
measurements of Q-value via mass spectrometry and 
nuclear reactions both,” rather than a request for papers 
saying anything about such measurements, for example, 
describing techniques for performing them. In general, 
requests which are subject expressions might be equiva- 
lent to either “about” or question requests. 

Some requests have imperative form, for example, “List 
the types of pi mesons and explain why each must 
exist," 3? and “Provide bibliography on thin films" (3, 
p. 51). The first of these examples appears to be equiva- 
lent to & question, "What are the types of pi mesons and 
why must each exist?" À number of imperative requests 
appear to be similarly equivalent to questions. The sec- 
ond example is a request for documents about thin films. 
“About” requests seem often to be imperatives. 


15 An example from the data on which (6) is based. See the table at 
the end of the next section. 
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Some requests can be interpreted as’ combinations of. 


` . simpler requests, for instarice, ^W hich.species of: Aphidi- hex 


dae attack. leguminous crops in the United Kingdom and ` 
how can they be: controlled ?"' (24, p. 30), or the — 


| and Monel” illustration used earlier. 


4 y 


"E 


‘ 
^ 


` 


* pounds ( inorganie, AB | 
E de), and mixtures of the above. Also of interest would E 


+ 


A request may give instructions for iórming many: gue 


' ferent subject; expressions, and can be understood ag the 
disjunction’ ‘of ‘those expressions. ' A: lengthy example 


between: the searchers and the requester: "fs 


[2 


Metal—gas reactions at elevated Veniens | 


“Send: ' the area under ‘consideration: here covers’ ‘the: Ne 


`. broad area of.surface reactións of metals with: gases:at 

- elevated ‘temperatures. This would include oxidation, 
“pitting, scaling, ‘thin film formation, tarnishing,. etc. 
he ‘metals and alloys: considered would not be re- 

-stricted to materials now used. in the glass industry such ° 


` as cast iron, platinum, ete.; but would be all inclusive... 


Both the - kinetics and- mechanism: of these: reactions . - 


are of-interest. With respect to specifying the gases to 

` be considered, the search'should be restricted: to oxygen, . 

air,’ SS water — 
dioxide and hydrogen sul- 


be chemisorption. of these gases on metals. 


Presumably “kinetics of aluminum-oxygen — at: 'ele-' 


», vated temperatures" is a- subject expression specified: by. 
. this request, and, so is any.expression obtained. from it by , 


- máking one of the indicated gubstitutions. . The “ete. "jn 


one’ of thé sentences of the “request, perhaps allows for ` 


an indefinitely long ‘disjunction’ of subject expressions, 


— Sire Dari OF Requer Form Frequmncres’ 


and may. help make the request somewhat unclear. 


k 
ta 
# 


Ted dde lists "of — requests’ were. 'ex- a 
amined to find instances of request forms not yet. con: 
sidered. The instances found: will not be discussed in the’ 


present. paper.. ‘However, the frequencies of various re- 


quest. forms‘ in the lists: are given in. the table at the end i 
of this sectión, Several things should be said abdut how 


the table was 'cotüpiled. A request. was-classified “Other” 


vof ab contained. roughly: Speaking, - "documental" terms 
` rather than just subject-matter: terms. Some ` examples 


— ` 
» 
A Pe 
" i ta 


“are: “What research and. development i is » going on in the .: 


4 
ë a € sy š 
~ a ` ` —— š a °` 
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sulfur and sulfur com- : 
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field of engines: ‘and Automotive components involving new 
lubricants, fuels; and ‘power transmission fluids?” . (8, 
P. 56). in which. “in: the field of” is 
, . Bon, and "List all papérs involving, the, study of human : 
volunteers” 
"study of” is documental. . Exceptions to this procedure 


a documental: 'expres- - 
(1; p: .64), in which “papers involving ` the : 


were requésts explicitly asking for documents on‘or about 


4 subject, such as "Prepare bibliography on the stress . 
. (8, p. 49) isthe following; which includes interchanges 2 


corrosion of.steel". (8, p. 58), or: “Is there a recent book i 
or. paper, preferably: in English, on the development of ` 
rockets?” (24; p. 32). “Nine imperatives "which. seeined - 


‘equivalent to questions were counted. as questions, Com- 


pound. requests. were counted as having the form. of any . 
constituent, request; there were no mixed compounds. ` 
Requests which gave instructions for forming disjunctions . 
of subject t expressions (all in 16) wére classified as subject’ 
expresion requests. A request which consisted of just, a 


- subject expression” WAS 80° classified, ‘without an attempt . 


to interpret it as ‘equivalent: to either & Hn or an 
“about” request (Table Tu ue a 

. The requests: in Table 2 M bas on exathination of | 
every tenth request in each list. The.nümber in ‘parenthe-. 


/ ses in éach case is the Edo selected number "e the 


2 first Tequest: examined: . — — 


°° 3. Document-Statement Infereneeæ. 
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In Part 2 2 retrieval. requests of — fórms were inter- | 


_preted ‘as: requests for documenta supporting statements ` 
‘of specified kinds. However a ‘document may support.a.' 
statement'in a variety of different yu. — of. these i 
. are desctibed i in this section. 


For. this discussion it will be — — a EM with ` 
appropriate competence is asked. to decide. whether a 


.partiéular document D supports statement P. by an 


2 


inference of type I. .An inference type I is specified. to. the 


judge. by ‘an, instruction, for instance, “Accept the paper ` 
E uncritically, use any: physies ‘as background knowledge, 
- and infer P conclusively.” "The exaniple just given js a 


composite inference iype, while the aude: of a 








Po, 


15 For aocees to the. AI on which (8) 1a based, the enters is — to Christine Montgomery. und J: L.. — of me Buhker-Ramó Corporation. 
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Request Form. i : | 
rt . T jd I . ` J r ! : I (US `=: P a - x d 
27 tnos ^ titBeüdmes ^, "Documenta : .Cireumstanees ts, 
hA AN Caco Aou rm use DONA 'Bübjéet XM ofréquest. . | ^ ^ Y um 
—— — | aud subject expression Other... '. — P z — 
— Pp: 47-66 — V — 16 | . 3. i Spn to OPOPA AOSE Et f WU owe c 
‘ U^ 1, Pp. 64-65. , l 22 E PRO sa as 1 s 2 Et x : T M ET — E 
. 23, pp, 47-49 . E 2 33 ; 15 |, ‘real, to iypothesized te — F 
— Ec Se. Sm ae ^ ideal system | 3.45 wu. E CORN a 
em 24, Pp: 30-32 BEEN (6 DET S 9 . '' some real "m n i — 
š i, ' > too, . ' 1 
: 2018. . o . 48 > —. eom -2 > ^ - invented, | " Lm 
.35, pp. KEK | "cu A x ‘10° , LN invented. l — 


Taste 2 


Documents 
Source 
. about a Subject 
| Question subject expression 
3, pp. 43-62(8) 20 E — 
ATpi« — 2 13 
11, pp. 130-134(7) — . — 10 


26, pp. 84-01(4) — 14 — 
27, pp. 185-200(2) 17 + 2 1 
28, pp. 781-796(4) 4 — 


Request Form | 


. Circumstances 
of request 
Other 


(oi. real, to hypothesized 
ideal system 
12 real, to hypothesized 
ideal system 
— . invented 
— invented 
8 . real 
2 invented 


H For access to all the retrieval — of which & sample is given in (23, pp. 17-49), the author is grateful to Pauline Atherton of the 


Ameriean Institute of Physics. 


described below will be deiecit (at least apparently 


so)! Each elementary type of inference will be described 
in the form of an instruction.’ 


AUTHOR INTENT 


The support of a partieular statement by a document 
need not have been intended by the author. Authors 
sometimes overlook even important and obvious conse- 
quences of their work. For example: 


Two theorems, important to the subject, were such 
obvious and immediate consequences of the ideas con- 
tamed [in Hadamard’s thesis] that, years later, other. 
authors imputed them to me, and I was obliged to con- 
fess that, evident as they were, I had not perceived 
them (12, p. 51). 


[A chemist] had done some experimental work in 
1955 and had published a report without fully realizing 
the relevance of his work to the chemical theory of a 
certain reaction mechanism. Between 1955 and 1957, 
he was led to earlier literature which suggested this 
significance of his work to him. During the same period, 
this fact was also brought home to him through three 
contaets with other scientists which had ensued from 
his work in three quite independent ways (29, p. 207). 


The inference judge can be instructed to infer state- 
ment P from document D only if P was intended as a 
consequence of D by D's author. However, in some cases 
this may be difficult or even impossible to decide. The 
Hadamard example illustrates that competent readers of 
& document can sometimes be mistaken about what its 
author intends. Therefore, alternatively, the inference 
judge might not be given any instruction concerning 
author intent. 


INFERENCE Basis 


The judge may be instructed-to accept all of document 
D as a basis for inference, or to accept only certain parts 
of it (for instance sections headed “Methods” and “Re- 
sults”), or certain kinds of statements.in it (for example, 
descriptions of what was done and observed). If kinds of 
acceptable statements are specified, the specifications may 


om een oen thin me 


not be completely clear. For example, competent judges 
may disagree about “what was done and observed" in a 


particular experiment, even if it is reported m a well- 


written paper. An illustration of this is the following: 


Consider two microbiologists. They look at a prepared 
slide; when asked what they see they may give dif- 
ferent answers. One sees in the cell before him a cluster 
of foreign matter; it is an artifact, a coagulum resulting 
from inadequate staining techniques. This clot has no 
more to do with the cell, in vivo, than the scars left 
on it by the archaeologist’s spade have to do with the 
original shape of some Grecian urn. The other biologist 
identifies the clot as a cell organ, a “Golgi body.” As 
for techniques; he argues: “The standard way of de- 
tecting a cell organ is by fixing and staining." Why 
single out this one technique as producing artifacts, 
while others disclose genuine organs.” 15 


The judge may also be.instructed, concerning any por- 
tions of D he is not asked to accept immediately, to admit 
them to the inference basis if he thinks they are reliable. 
He may further be asked to add qualifying phrases where 
he thinks such addition will make passages reliable. Ex- 
amples of such phrases are “perhaps,” “for the user popu- 
lation sampled," “in that sense of 'Algol-lke, " “if one 
can assume consistency," etc.? Such modified passages 


‘are then also to be added to the inference basis. Note 
that two competent judges may disagree about what pas- 


sages in a document are reliable, and what critical annota- 
tions are necessary. This js illustrated by the examples 
given in Part 1 of disagreements about the conclusiveness, 
significance, and usefulness of documents,*” 

For the rest of Part 3, any reference to inferences from 
document D shall mean inferences from a basis derived 
from D. 


TYPE AND STRENGTH oF INFERENCE ` 


The judge may be instructed to determine only whether 
D formally implies P, or to consider also the possibilities ' 


15 Hanson cites “the papers by Baker and Gatonby in Nature, 1949 to 
present" (30, p. 1). i 

16 Thie kind of annotation only decreases (roughly speaking) what can 
be inferred by D. Annotations which increase D’s power are background 
knowledge sugmentations, which will be discussed later. 

17 Part I, Kinds of Unclear Requests, sixth paragraph. 
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that p m D are  contiéeted lus a — gtatistical 
. argument, or by: a nonmathématical probable inference. 
The: last mentioned type ‘of reasoning is illustrated by the 


. inference from. biochemical information and animal test, 
régults that a particular drüg i is safe for. human testing, or. 


, the inferénce that, a. “physical constant has a certain value. - 


` beónüse. rhéssurements of it by several ‘independent a Ë 


odg haye, given closely agreeing results. pra 
if the judge i is. instructed: ‘to look for & probable infer- . 


. ence of either. kind, he must also be told how strong. it 


‘must be; For horistatistical ‘probable. arguments. there ` 


Ju , appears, to be ño precise language for specifying inference - 


" strength (see, for: instance, S1, p. 212). ‘Only an’ approxi- . 


`màte- Janguáge seems, to: be available, using such’ expres- 
' sions.as “a bit ‘of "evidence for," “moderately supports,” 
males ‘highly probable; dd etc. It is uncertain how: clear .- 
this. language i is in various circumstances. It should also . 
‘be noted, éonceming™ informal. probable inferences; that 
. two’ competent. judges, may, disagree about the strength.» 


of a particular inference: ‘Examples ‘of such disagreement ^: 


. were given earlier.5. An. illustration of a more general. d 


‘kind! is-the following: passage -qüoted. by Polyanyi ane 
"from the directions to Royal Society referees: 


oA paper. should not, be: recommended. for rejection. 
pres because the referee disagrees’ with the: opinions. 
or conclusions it contains, unless fallacious reasoning Or. 
. experimental error 18. unmistakably. evident. , Le 


.. The: judge. may. be. instructed to determine, supposing 
2p. does not: formally. imply P, whether D formally im- 
` plies & probabilistic. assertion of P, ‘such as “It is fairly `; 
: probable that P” or-“It isa plausible conjecture that P.”. 
. Such a statement might, be implied" if;. for instance, ihe- 
original décument D: only advanced P as & conjecture, or 
ian unqualified. assertion of P by D has been weakened 


` by .£ éritical annotation’ of the inference- judge. If the |. ^ 


“judge: i is to look for such an. implication, ‘he must: be told: 
how strong-the probability: attributed to P must be: For” 


: ‘-nonstatistical. situations this specification can ‘apparently | 


” 
T 


” 
- 


^ 
, 


Nu be: "approximate: -and perhaps somewhat unclear, a. 
"for the rnónmathematical probable : inferences referred to” 
“in the preceding paragraph” 


In- summáry, the. judge - uidi be told — — of i 


inference ‘to, consider - (formal logical, . statistical, informal | 
-probable) E and how. strongly P must be ns by. 


3 
t `: % ; BE. 


R ‘whatever. kinds c of au na be used., P a ae 


` Baoslandon Kxowuison. TD 3 E - dE a 


“The, inference judge might be intruoted to use no pack· 
- ground: knowledge. in attempting ` to. infer. statement P. : 
~ from document D. "However to draw-any inférence at ali - 
from á document i in natural language, -he would have to! 
“use his general knowledge, of the language. For instatice, - 
- he could’ hot otherwise’ infer’ "Lethenóne. is’ neurotoxic”, 
_ (Lethenone” is & fictitious insecticidé name)’ from such”: 


passages as “This neiirotxicity of Lethenone implies that 


= - `; 
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it Hou. not... or “Ts — neurotoxic to. 
humans? The ahswer is definitely “in, the affirmative.” 
Moreover, if he were instructed to use.no knowledge of: 


- language but only to look for ‘the, word, séquence, “Lethe- : 


none is neurotoxic,,’ 28 a guide to inferring that statement, 
. then he: would incorrectly infer it .from, a document in 
which, - -for instance, “the - Word. sequence, “Lethenone is 
neurotoxic,“ ouch in a series; of ;clauses. preceded, by.. 
. the. expression. "The. following falso. statements, have. ap- 
peared in the press: :?" No attempt will be male here to. 


` define the notion’ “general knowledge of the language." - 


Note, “however, that such. knowledge i is not only. syntactic 
^ (unless ' “syntactic” i is, defined extremely broadly); since i. 
also includes knówing the specific'meanings of such Words : 
as “answer, " “affirmative,” and "false; How. “general ^ 


Es knowledgé of the language": ‘should be defined for pure» 
poses. of request form clarity needs further study. P2 


To infer & statement’ from’ a document may” "require ` 
knowing: méanings of. particular subject-matter expres- 


— The expressions might ‘be ‘verbal; . for ' ‘instance, ` 


“neurotoxic” or “nuclear reaction parameters," Or. they 
might- be ríonverbal, :for instance, equations or diagrams. 
-Subject-matter rovidre whichis not merely of expres- |. 
sion meanings may- -also be needed to draw. an inference’. 


For exainple: ce e n. d — bg = 
EN physiologist. Should: s — E nn, dor. 
has a slow: heart rate and should index -Bradyeardia. . 
even if the author. discusses’ Heart Rate: A chemist 
on justifiably be unaware | of that fact. (35, p 29). 


- The ‘physicist’s fünetioh aB añ: analyzer of thé- e 
“tieva request]: data s been — to be largely. i in. 
Dd reas: 

‘First, to improve. the precision: P. thes search. re. 
"quests where improvement was needéd, eg., “advances.” 
in the ‘development of the optical model (expérimental ' 
and: theoretical) "— —since the ‘optical model, is a theo-. 
‘retical method, ‘for calculating éross-sections; angular’ 
‘distributions and polarizations, the experimental: por- 
tioh of this requést would. necessarily relate to determi- ' 
“nations of these parameters for; nuclear and. incident: 
particle energies for which. optical model: salculaBgns 
are valid (23, p. 18). M y 


B 
Mr 


` Suppose document D does noi C seinen [p a. 
a formal: implication ` or a mathematical statistical 
> argument, and the inference judge i ig supposed 1 in güch.a 


|.» ORBE tordetermine ‘how much D supports P by nonmathe- | 


: matical’ probable infereñte, He cannot determine thisina. 
vacuum, in'the Way that, a fórmal logical or mathematical 
- statistical ‘argument.can-be developed’ and. examined: In- 
stead: he must have and use a considerable amount of. 
background knowledge ; in the subject field (5) ‘of ‘document, 
D and statement P: For. if he inférs-P from: D with: some ' 


_ probability, he must know enough. to be assured that there’: . 
". is no. sufficiently . plausible way D' can be: true: and. P. 


false. On the. other liànd, if he judges D.to support P 
` with less than a certain: strength; he must. be assured that ` 
he: has. not overlooked some likely. enough. -connection : 
"between: D and P which. someone knowledgeable i in the: 
— could un out. Tus if, the inference e judge: à is 


e» d 
2 


t 
4% 


— 


a=. 


i 


not allowed or not able to use a full range of subject 
knowledge (of & not easily specified extent) as back- 
ground knowledge to augment document D, he is re- 
stricted to formal logie and mathematical statistics in 
attempting to infer P from D, even from D augmented 
with whatever background knowledge he is able to use. 
For similar reasons, if the inference judge cannot use a 
full range of subject knowledge (of & not easily specified 
extent), his role in deriving an inference basis from D is 
limited. He cannot judge for himself the reliability of 
passages in D, and he cannot add critical annotations to 
D, except those concerning logical or statistical E 
within D. 

Suppose some subject-matter knowledge may be used 
by the inference judge. How clearly can the knowledge he 
may use be specified to him? A: list of books would omit 
current knowledge (including that which corrects errors 
in books), and adding a list of journals would still omit 
“invisible.college” current knowledge. But the requestor 
may want current knowledge from some fields to be used. 
How clearly it can be specified needs further study. A 
different way of specifying what subject knowledge, espe- 
cially knowledge of expression meanings, may be used by 
the inference judge is to provide a “thesaurus” which lists 
technical words and phrases, and gives other technical 
expressions which may replace them. However, a con- 
ventional thesaurus may not be immediately usable by 
an inference judge. For example, a typical thesaurus 
entry is "bananas: use food," but "Monkeys do not re- 
quire food to live" hardly follows from "Monkeys do not 
require bananas to live." Similarly, “Pennsylvania: also 


coded United States" does not warrant inferring from. 


“Harrisburg is the capital of Pennsylvania" that Harris- 
burg is the capital of the United States. In general, the 
subject knowledge which the inference judge may use 
might be specified to him in a thesauric or other codified 


form, rather than in natural language texts. But investi- : 


gation is needed of how this can be done clearly and 
correctly. 


Some POSSIBLY CLEAR SUPPORT SPECIFICATIONS 


The instructions to the inference judge might be clear 
if they specify, for example, the following: 


Pay no attention to author intent 

Use the whole document as an inference basis (or sec- 
tions of it with certain headings such as “Methods” 
and “Results’’) 

Consider only formal implication of P by D (or mathe- 
matical statistical inference deriving P from D with 
at least a specified probability) 

Use as background knowledge only “general knowledge 
of the language” 


In the preceding sentence, 
“will” if a definition of “general knowledge of the lan- 


guage” can be formulated which is clear enough to pre- 


vent disagreements among nd judges. This needs 
further study. 


"might" ean be changed to 


¢ 4. Some Remarks 


A subject document retrieval system which provides 
documents supporting specified kinds of statements in 
specified ways is not a “statement retrieval” or “fact 
retrieval” system. For a statement retrieval system pro- ` 
vides statements of specified kinds which are supported 
in some way (often system-specified) by the corpus of 
the system. A document retrieval system, on the other 
hand, provides documents from which the requestor or 
some other reader external to the system must infer the 
statements of specified kinds. Thus, a document retrieval 


‘system does less than a statement retrieval system. On 


the other hand, suppose a document retrieval system suc- 
cessfully permits a requestor more freedom in specifying 
the kind of support for statements which he wants than 
does a statement retrieval system. Then in this dimen- 
sion the statement retrieval system does less than the 


, document retrieval system. 


À. basic emphasis of this paper has been that clear re- 


- trieval requests are important. But it is often asserted 


that a searcher is unavoidably "unclear" about what he 
wants, especially if he is searching in a somewhat un- . 
familiar field, as is often the case. Therefore, it is also 
asserted, he needs the assistance of a retrieval system 
expert, a cross-reference system, search cycling, or all 
three. However, there need be no inconsistency between 
these two viewpoints. If it is generally possible to formu- 
late clear requests, this should help rather than hinder 
dealing with an initial uncertainty about what the re- 
questor wants. In this context it should be noted that 
“unclear about what he wants” is ambiguous between, 
for instance, not formulating a clear question request and 
not knowing what answers wil be supported by docu- 
ments satisfying a clear question request. This ambiguity 
seems often to be overlooked in documentation literature. 

Several remarks might be made about the kinds of un- 
clear requests described in Part 1. Subject retrieval sys- 
tems usually discourage clear language, for request form 
meanings are unclear; however, if request forms have 
clear meanings, then ambiguous subject expressions and 
syntactic ambiguity may be no more frequent in retrieval 
requests than in other kinds of scientific communication. 
“Vagueness of scope” in Part 1 referred to, roughly speak- 
ing, how indirectly a document may say something 
about a requested subject; a request does not have such 
an obscurity if it specifies what kinds of document-state- 
ment inference, especially what kinds of background 
knowledge, may be used. How “user background” re- 
quirements in a request can be formulated clearly needs 
very much study; the concepts of statement kind and 
document-statement inference kind may be more helpful 
in such a study than the usual documentation language 
of subjects; for example, a requestor might be better able 
to indicate what will not be new to him in terms of kinds 
of statements than in terms of subjects. “Reading be- 
tween the lines” of a request seemingly unavoidably leads 
to differing interpretations of the request; to prevent 
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an. documents retrieved to satisfy a — formu- > . 8. — C., — nensi for Chemical “Imjórma- I 


‘lated “réquest should be kept separaté. from -additional * ~ P ¿tion ' and ` Data System. Report R-1755,: Frankford. 
.documents- volunteered as a, result ot^ ada between K - Arsenal; Philadelphia (April 1965). — ^ : 
‘the lines”. of the request... ieee. I pU E Gurr, C. D., Seyen Years of Work on the Organization” . 


. of Materials in the Special Library, American Docu- 
| mentation, 7 (No. 4) : 320-329 (1956). Puno A ee 
Des "WAYNE,. LA 'Survéy. of Users of the Aman Soctély z 


In the physics tetrieval experiment of Swaiison dt aL 
" (6); there was little disagreement among relevance , 


` , judges, and. a special. attempt was made tó; "formulate ; < y Pus M tals, Western Reserve, Univetiity Searching 

- requests clearly, e However, the requésts were not.ae- +... Service, Bureau of Social, Science Research; Washing- 
— by inference instructions, thé question requests’ . 2. ` ton, D. C, 1962. - E. 
° (as most of them were) were not accompanied by súbsti-- 8 Swanson’ D. Searching Natúral — Ho by. ` 
` tution. condition ' statements, and "there wag: only &' ok * Computer, | Science; ame (89: A (Osteber o 
sketchy: indiéation, of “aser’ background”. requireménte .. . 71980." .,'- E 
(“the context'of a college examination"). Presumably. in:. — 7. Moorns, -C.,. The Intensive’ eru Test, Titer ‘Co, j 
. this: experiment there was: enough unstated ‘intellectual ^: * Cambridge, Mass: (August 1959). — AN 


8. IBWANBON: D, Research” — oe — s 
"' dexing, In: Machine ‘Indexing, | ‘American Univérsity,. ` 
E Washington, D. O., 1962, pp. 281304: . Ms EE 
9. Swanson, D, An Experiment in "Automatic Tent’ 


. accord among ‘the relevance. judges. that ‘such ‘explicit 
. formulations were "unnecessary. . It is not clear whether | 
-this was: "becausé they were all nuclear ‘physicists, or for. 


more specific reasons.. In any case, Buch: prior intellectual . E ` Searching, . Rep: “No. C82-OUÀ,: ` Hamo-Woóldridge. 
rapport “cannot always.be. safely assumed. When 16 cán- ae 2 = Corp., Canoga’ Park, Calif. ( April 1960). - cM LL 
“not, more- explicit. ‘formulations of requests are needed. .. “10. "Fasaueon,. M., The. Communicable’ i. Literature = 
The discussion: in Parté 2 and 3 centered on: statement .- — F o In: dujornmalión Retrieval in Action, Press. of |. 
. types and. document-statement: inference types. It is >` : DU "Western Reserve Uhiversity; Cleveland, "Ohio; 1963, 
© suggested that, for. a number of documentation problems, E. Y - pp. 135-140, " i . 
^ "these concepts may be clearer and more ‘helpful than the ~~ 11, Custimios, C. Report. von the Testing and. Analysis: of 


usual documentation. notions such ds subjects, relations us : on Investigation. into: the: Comparative Efficiency of, : 


; between” subjects, pertinence ‘of. subjects to: documents, . «Indexing “Systems, College of Aeronáutica, Cranfield, 


“England (October 1962). `  :-- — 
-ete. For instance, they may be more useful in formulating : 12. Jano J; Psychology of Invention: in’ the- Mathe- E 


rules to guide subject indexers. The. distinctions among: Zr matical. Field? "Princeton University Press, 1045, * 
various kinds of. document-statement support may’ help’ . ee printed by Dover, ‘New York; 1954) ^. 00 


clarify the dispute about. -how much: subject knowledge . 13" Miuixen, C... H,, “Cerebroyascular Disease, Science, 


t 


- 


"+ indexers, ‘must have. - To "the extent. ‘that the ideas in: ".* 52: 805: (May 6; 1966). Var, Moz DE 2 
- Parts 2 and: 3 will, permit clear subject, requests to, be: . M. Mazen ; T. H... and "W. M. Lirik; Letters Since, 
" formulated, the testing òf retrieval, systems might be: deg. ^ i EP 150: 19 (Octobem 1, 1965),. . "IEEE — 
‘complicated by, relevance’ disagreements, And the: state- > . 15. ‘Porce, J Lissajous , Figures. by: — Goinpute, 
. ment and inferenée concepts: may ‘be more helpful than . -. a ` Science, 149 : 1446 (September 24; 1966). 45 Q s 
traditional «documentation : notions in. -specifying what a. - Reis, ‘A. Remarks; ‘in Faceted- Classification Schemes, 
“computerized : ‘retrieval ‘system is ‘to dó; If any or all of . ° - by; B. C. VickErY, Graduate School of Library: Serv-. `, 
these: suggestions turn. out to be true). it, should not-be: -> “ive, Rutgers: ey New Brunswick, N. p p 


* too sutprising.” For in the sciences it | appears that a docu- us app. 15-16... 55 


ment is usually important to a requéstór because it thakes 17. +BRYANT, E., Schema of — of Failure i in IR — 
certain ‘Statements and provides. a basis for inferring cer- o and Their Con&equences; In Study Confereñice' On . 


— | - Evaluation of. Document Searching Systema and Pro- 
: tain other statements. He cannot use subjects. frony the vé ness CODO). National Science Foun dation, . B 


S literature in his work, pura he can use statements... °° >: Washington, D. C. (PB 166-005), B1-2 (Feþřäary 10, .*: 
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: Á Study of ihe: Use: a Materials Circulated from. an: n Engineering . 
“brary, March to May. 1956 © 


E — E Do IM E M E E Mr a ? T k e i — a — 
- The purpose’ of this: ‘study’ undertaker! from: Marth’. to progressed, ghile ilie graduate-faculty gróup borrowed ` 
May 1956 was. tó. determine how ` an academic engi- . - increasingly. for this reasón.. Supporting the conclusions | 
neering library: was. used by’ two groups of users, the. ^7 ‘of. the other. studies, . the most: important source. fer: 
undergraduates. ‘and. ‘graduate: students-fáculty, ' The 3 léárning about an. item "was personal. through, récom- x 
' part of the study reported here i is the. result’ of ‘a. ques- ' np _cinendation;. however, one of every. four, items charged ; 
tionnaire’ given to the. user at the time ‘he charged, oy: Out. was: ; discovered. in browsing. through the library's $. 
an item dt ‘the circulation desk: to ascertain -for what ; ; collectión. rom this. study. one can “conclude that. not. 
purpose”: :he selécted ithe: item(s) and how he learned +: ! “only, must librarians, ‘be acquainted with their users: as, 
about itas a source of information. Variations in the. E individuals, bu! that the physical crrangeément, ‘of library. 
' reasons for selecting items ‘changed during the. ‘three. . $3 materials. is,. an important factor. ‘in apa: to. 
e periods the: data were collécted:: Ihe. ¿ndergtaductas: > — information i in an academic environment. TE nM 
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^ The i increase in T amount oa variety. of published cc to p EM cs ‘of ident Ps iridis er E 
-materials during. the last: decade- has been. the subject ` ' The College- of. Engineering . at ‘the’ University -of Wis: 
of- considerable discussion . by, . college’ and: university. consin, ` which is: ‘certainly. ‘not atypical; has: had am. ' 
librarians, ¿An equally: dramatic, but’ perhaps “less dis-:" ; increase.: -of 600 percent ‘in. its. graduate: enrollment, 
_ cussed and appreciated ‘problem is the change that — » "while the undergraduates havé almost "doubled: 10; ic 

e : and’ universities have ‘undergone since the war;and the. past ten years. Similarly the number of. research. "pro- 
j changés which will have to take place if those educational’ ' jects presently being carried. on, is over 200 as: compared: 
; institutions. are going. to. cope with the ‘growth of our" to approximately 20-in, 1939. ORE: 
'"pópulation-and ` our material . culture. ‘The’ curricula. of > These” changes: 1 in. the: :curricülum, s iddnt body; a 
i our schools have. to bé ‘planned to educate students. to- D — have profoundly- affected ` the. organization and. | 
Sr Sa technological , world that was hardly. believed, possible. >. services “provided by . the Engineering, Library". New | 


‘even fifteen: years ago: “Chemicals that were ‘only’a test _ empliases and. new. organizatión. has become. mandatory“. 

uba curiosity. a short time. ago ‘are, now’ part of our. ` Present space ‘is inadequate. But. because plans ` ‘are Bow . 

, daily: lives: .. The: majority of. medicines: now Sold over... under: -consideration. for. the construction . of. a new. li-. 
E 2 the drug. counters ` were not: listed: jn the: official Phar, _ brary: (still some four to six'years. hence); expensive ! 
2. ' macopoeias : :ten years ago... The’ science “of electronics: renovation ' would : not be accepted budget-wise: ` “Any, 


~has , changed, the. social ' pattërn of our; homes : through: ` expenditures’ for capital . equipment must be, or. št; least: 
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questions, a two-year study was planned to determine 


if data could be obtained to solve some of these prob- 


lems with assurance rather than just by empirical 
guesswork, 


€ 
* Description of Project 


In organizing the project, the functions of the library 
were divided into two aspects: (a) from the viewpoint 
of the librarians, and (b) from the viewpoint of the 
users. Or, to phrase it in another way, the functions 
were divided to determine the efficieney of the techniques 
and: routines and the more intangible aspects of deter- 
mining how the library was understood and used by 
the patrons. 

For the first part of the study, arrangements were 
made with the instructors of the Time and Motion 
Study classes to use the library as a laboratory. At the 


time of this writing the study of the faculty and students ' 


has pointed up many practices which can be carried on 
more efficiently. However, it is still too early in the 
project to obtain information on general methodolgy 
and techniques. 


An examination of the Engineering- Library showed 


that the way in which the collection was used could 
be divided jnto three categories: (a) the reserve and 
reférence, collection, (b) the materials used within the 
library other than those items on reserve or reference, 
and (c) the materials checked out for use outside the 
library. It is with the latter category this paper. is 
concerned. 

This aspect was chosen first for study for several 
reasons. How the patrons become acquainted with and 
for what purpose they use the reserve collection are 
fairly well known. The more important question was 
how to provide improved service for the reserve collec- 
tion, yet integrate the actual work with other aspects 
of service that need to be improved or expanded. The 
physical arrangement of the library and the shortage of 
personnel precluded any attempt to survey the use made 
of materials within the library at this time. Second, it 
was felt that studying this aspect first would give a good 
view of the patron approach to the library. Herner's 
study on the “Information Gathering Habits of Workers 


in Pure and Applied Science” (1) showed that of the 


600 scientists interviewed at Johns Hopkins, only 11 per- 
cent did most of their reading in libraries even though 
42 percent of this group depended primarily on the 
library for published materials and 49 percent depended 
equally on the library and their personal collections. 
Herner also points out that engineers did the least amount 


Y 


of reading in the library. Bernal (2) found that 56 per- - 


cent of the scientists responding to & questionnaire obtain 
their materials from a library and that one-third of the 
papers studied were taken out of the library. Although 
the findings of Herner and Bernal, cannot be applied 


directly to the situation under observation here, the sta- 
tistics they give do indicate that data obtained from in- 
dividuals checking out books and other materials from 


` the library would reveal to a considerable extent the way 


the library collection is used. 


'* Method of Study 


In selecting a method for gathering data two possibili- 
ties were considered: interviews and questionnaires. Al- 
though interviews are decidedly the better method for ob- 
taining information for this type of survey, insufficient 
staff prevented us from.adopting this method exclusively. 
If mailed questionnaires are used, there is always the 
danger of an unrepresentative sample. The method finally 
selected was comparatively simple and direct. A ques- 


‘tionnaire was prepared (see Table 1) which was given 


to the patron at the time he was charging out an item. 
The patron ‘then could ask questions if there was doubt 
in his mind about how to fill it out, and the patron could 
be interviewed on his responses. This method had the 
advantages of a questionnaire where answers are con- 
sistent; the ambiguities could be clarified through an 
interview, and the person filling out the questionnaire did 
not have to rely on his memory in giving his answers— 


TasLE 1. Data obtained and questions asked 


You have selected this item you are now charging out 


— TI. for classroom work 
— JT. because it is applicable to your research project 
- Or thesis 
— — IIl. to provide you with information in or allied to 
your major field 
— IV. because it is of interest to you for reasons other 


| than those above 
You became acquainted with this item 
through diseussions 


===" with members of your department or project 
— J A with professional contacts other than a mem- 
‘ber of your department or project 
== Vil. with someone other than a person connected 
with your work 
— VIH. through an advertisement, review, or book 
] announcement 
— — IX. through browsing in the library collection 
— X. through a bibliography 
— dË you found in a book 
cene d. you found in a magazine 
EXIT especially prepared for a class or a subject 
field 
XIII. through a subject heading in the Engineering 
| Library Card Catalog 
XIV. through some regularly published index or 





abstracting publication, e.g. the Engineering 
Indez, Chemical Abstracts, ete. 


— 
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X cónnectéd with the. "University of ‘Wiscorisin. Materials E ; 
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he Was. questioned. on — purposes ànd methods’ at the- 


~” time he was, involved in selecting an item.. - This ‘avoided | 
the necessity: of asking an interviewee to ' ‘guess” An his , 
, answers. in spite. of this apparent convenient afrange- `` 
a ° meii more. than’ a few ‘questionnaires. had. to ‘be idis- ~ 
-carded because: pátrons: disappeared béfore.ambiguoüs ..- 


answers. could: be clarified through an. interview.. <o v,- 


Because the patron may charge out. materials without 


the assistance of library personnel at ‘the Engineering. 


Library; not every individual who charged out material’ | 
. was asked. to ‘complete. the questioundire. Consequently I 


considerable care was taken to acquire a “random”: -sélec- 


tionnaire. The reason, they | gave in. each: instance was that. 
. they were due át a-class, . TE 
"The questionnaire was given only to, — ad 


charged’ out to individuals, organizations, or ‘institutions 
other than: those affiliated with the University eonstitüte : 


less.than 5 Percent. of the total non-fegerve book circula- 
tion. Further it was felt that materials. cirċúlated outside - e 


Ma 


. tion . of individuals. During the ten-week “period . of. the... 
. Survey ' (March 6 to May 15, 1956) every'effort was. madé: ys 
to get questiobnaires. throughout tlie time the library? “Was E 
open.. Only. two individuals fefused to fill out the ques- - 


: the University - ‘could be ignored bécause the. primary con- . 


cern was to determine how to improve the’ Engineering 


SR to, meet the neéds ` e the Ameen body and: F 


t 


pde SE A UT. dme d o 
° — Affecting ‘the, “Use of the’ T 


1 


policies on the use of the library arid its physical organiza- 
. tion must be understood. “AS was ‘already mentioned, the. 


eke is permitted to ‘charge out materials without the ` 
` assistance of library personnel. All items in the Engitieer- . 


ing: Library may be: borrowed for a period of one 'month 
(faculty members have: -A/siX months' borrowing privilege). 


E get. a: ius of the. reie ot: this survey, — 


_ except those’ items on the. reserve ‘and reference shelves | 


and current’ periodicals: Even the latter may be kept for ` 
a month. or’ longer if the. patron requests & loan of that | 


length: of time’ and no ‘other. individual; ‘dsks for the item: 
The library, has open; Stacks: ‘and the only items which 


. the patron e cannot obtain himself. are. “microfilms which - 
. are; Kept in a room. with the microfilm reader and. — 
he. librá&ian's ‘reference. works which “are kept in the. 


` 
+ x 


. librarian’s office. There is nothing. sacrosanct about: the | 


^ librarian’s - office or: the ‘microfilm reading room- and once: 


- the, patron is' aware of the physical location. of these 
items, he. may use them without any. Special permi&sion 


` from the library staff: Although thesé two categories of. 
m ‘materials are the only ones not "readily , available". to 


` the library patron, the groupings and, arrangement: of 
other materials does have an’ effect ‘on: their use. Lis 
_ brarians poen forget: that, the ‘patron must — & great. 


' 
at 


i libn TERMS HN 
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1 
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deal about thé organization of a ld library blow 


a 


 he'cán use it effectively—the librarian's view. of. the | 


physical organization 15 that 16. is logical. and sensible and ` 
consequently obvious. The organization of materials may. 
- not. be obvious to. the patron if be is expécted to locate 
«items for himself. : 
‘tioned - ‘above, the: patrons: of the Engineering” Library. 
must be acquainted with, the- following groups. 

"ds Periodicals and ' “continuations. ` Bound. pi 
— continuations . are ‘arranged . alphabetically .by the 
Most recent’ titles (Cutter numbers are ‘assigned : each file). 


¿Other than the. two categories men- ' 


--Unbound periodiéals are kept:on `a separate-set of shelves . 


until ready for biriding. The continuations on: the, other _ 


hand are placed in pamphlet boxes beside the bound pum-. 


bers.* Not “all. ‘publications which night be defined -AB B^ 


periodical. or continuation are in this collection; sole are 


o classified and kept with the “book” ‘collection. 


2. Books. ‘The. books are divided. into. three: groups: 


| ¿Congress classification. . 


system, in 1953. [From the. librarian’ 8 viewpoint, “the 
Publio, Catalog cárds are clearly marked tó show in which 


group. a book may: be found, but the distinction, às far | 


: from obvious to many patrons. /: ~ - 

3. ‘Reference and reserve “materials. * These are’ md 
in one; area.’ Only: the. main entry “card: in the. ‘Public’ 
“Catalog gives the location of these itém&. 

4. Indexes and abstracts. 


. special grouping: Only the. publications devoted - entirely, 


(a) the new and/or 'uncataloged: books, (b). the: book i in, : 
the: Cutter classification, ‘and -(c) those in. the” Library of. 
"The "University of: Wisconsin : 
 libraties changed. to. the’ Libraty of Congress classification | 


All.of the current. indexing | 
publications: the: library. ‘subscribes, £o are located: i in this - 


Or” almost entirely, to -the ` "publishing of abstracts: ares 


“placed in this: category; eg., ‘the’ abstracts published ás 


: a part’ of.a professional wd atp b bound ind kept with | 


* > 


that periodical. Vei 


5, Theses. The. ‘theses ` ‘are leed: separately: And: a | 


edant eatalog is maintained for them: 

: 6. Pamphlet collection. Materials in this category mre 
"placed iri pamphlet. boxes. ‘There are: ‘subject references 
of E collection i in the — Catalog. ad 
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Wher a — charged: out" more. ibero one item at a 
- time,. the ‘responses’ obtained: from: thé: questionnaire 


+ 
` 
t 


4 


were tréated as a unit, unless the patron indicated. the ` 


.jtems he was charging: out were to be used for: différent 
, ‘purposes or he- became: acquainted with, themi through I 
_ different, means, We were interested: in determining how 
., the individual patron used the library. rather than how. | 
M. Separate items of, the library. were used of located, ` E 


Often: the, patrón: became acquainted with. añ item: 
through-a series of steps. For example, i r patron, while 
scanning, a current periodical, ed a reference tó A: 


E t 
74 "na «E ` 


book in the biblography of an article. He located the 
book in the library, but discovered it would not be of 
interest to him, but on browsing through the books clas- 
sified in the same group, he found one that did interest 
him. In such an instance, the last step was the one used 
in scoring the questionnaire. 


ecause of difficulties in distinguishing clearly between . 


faculty and graduate students, the responses of these two 
groups were combined. 'This difficulty arises from the 
fact that research assistants, fellows, etc.; have the same 
privileges as faculty members in the use of the library. 


Oftentimes the graduate would not indicate his faculty - 


status or a faculty member did not indicate that he was 
doing graduate study. 

Although no one who filled out the questionnaire ques- 
tioned the meaning of "research," the scoring of the 
questionnaire might be confusing without some further 
definition. A large number of undergraduates indicated 
that the reason for selecting an item was because it was 
applicable to their research or thesis. The only under- 
graduate group in the College of Engineering required 
to write a thesis are those in Civil Engineering. For the 
thesis to be accepted it is not necessary that “original” 
research be done as might be defined for a thesis to com- 
plete the requirements for an M.A. or Ph.D. degree. How- 
ever, this ambiguity in the interpretation of the meaning 
of research is lessened if it 13 realized that the question- 
naire attempted to obtain individual responses and that 
research, even in Engineering, does not necessarily always 
have to include experimentation. The undergraduate stu- 
dent who prepares a paper as a requirement for his class 
work may be doing research in the sense that he is relat- 
ing ideas and concepts which are new to him, and for that 
matter may never have been related before. Because few 
papers prepared by undergraduates are published does 
not mean that the ideas in them are that much less valid 
or important than many of the papers that do get pub- 
lished. To the undergraduate, his paper is a means of 
classifying and organizing knowledge which is the major 
part of any research project. Since our purpose was to 
try to gain some insight into how the user viewed the 
functions of the library im relation to his own work, a 
more rigid definition of research did not seem necessary. 

The questionnaires were grouped into three time pe- 
riods, the first of four weeks and the last two periods of 
three weeks. The four-week period included the spring 
recess. 


* Analysis of Data 


The only other survey known to have been published 
that might be compared to this one is that of Urquhart 
at the Science Museum Library in England (8). A ques- 
tionnaire was sent only to those individuals borrowing 
through the mails, that is, on an interlibrary loan basis. 
Since the two purposes of this survey were to determine 


Classroom Work 


for what purpose the items checked out were used, and 
where the patrons obtained their reference, the studies 
of Bernal and Herner previously mentioned are not di- 
rectly applicable. They were more interested in evaluat- 
ing literature research techniques and determining the 
relative usefulness of research information and reference 
services. However a comparison is made between the 
findings of this survey and those of Urquhart, Bernal, and 
Herner whenever 1t seemed applicable. 

A total of 371 questionnaires were completed and of 
this number 173 were completed by undergraduates and 
198 by graduates and faculty members, Thus, over 50% 
of the circulation of the library is to this latter group; 
162 of the questionnaires were completed by individuals 
who had previously filed out a questionnaire. 

The Roman numerals in the following sections cor- 
respond to the numbers in the questionnaire of Table 1. 

See Fig. 1 for a graphic presentation of responses. 


REASON FOR SELECTING ÍTEM 


I. For classroom work. It is not surprising that un- 
dergraduates indicated that 47 percent of the materials 
they withdrew were for classroom work as compared with 
only 28 percent for the graduate-faculty group. To those 
who feel undergraduates do not engage in research the 
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— would. ‘be 77 pércont, The responses between the 
three periods decreased : 5 percent and’ then 10 percent. , 
as the semester progressed: with the ündéréraduate group, 


. whereas- with the faculty-graduate group this reason, de- 


, creased by 8 percent, and 10. percent. This may be. due”: 


J meant class work, A 2l 
IL: : Applicable to research pom or — 30 per⸗ 


— 


"to the fact of the: ‘individual definitions of research: to 
. the. undeigraduate ‘the completion of '& semester paper , 
implied. research; ` wares to. the; Lovaas student, it. 


cent of the undergraduates and 37 percent of the: gradu- 
‘ate-faculfy group gave:as their reason for’ selecting * an 
item that it assisted. them i in their research. The signifi- 


cance here. may be! ‘that one-third of the. circulation, of the ` 


^ brary 3 jg directly: concerned. with the. research projects . 
` of its pattons.. In both ` groups this reason for selecting: ` 
‘an item increased during tle three periods, 5 percent, and. 
10 percent inereases for:the: undergraduate, 10 percent 
. and 16 percent increases for the graduate-faculty group: 

III. Information allied: -to the major field of interest. 
` In the first-period two of every five items checked out by 


thé graduate-faculty group were for collateral reading i in 


‘their major. field. By the, next: period the ratio- had 
dropped to one to five and in the last period to less than. 


one to six. The literature borrowed from the, Science’ 


Museum Library Was used mainly for. “Technical Dé- i 
' vélopment Work," and “General Information", purposes.” 


If these terms can beinterpreted to méan that the ma 
- terials drawn out were for the sime purposes as referred. 
-to in this questionnaire, thé average figure of 20 percent : 
- found in this survey is not inconsistent with’ that. of ` 
‘Urquhart’s. Whether this gradual decrease i in the amount" 
`of materials drawn: out-of the library for collateral reading. 


.18 an, indieation of the research study; habits of the pa- . 


^ trons cannot of course be ‘definitively. concluded, but. it 
` does reveal a. periodicity which: parallels the work or- 
dinarily experienced during a semester. -` iau 
"The undergraduate: group did not show much. variánce 
- between tho three periods. . In -the first: ‘period this reason" 


i “constituted 14 pu of the responses and m pace in. 
-the last period. 1. 


IV:" For reasons -other than ^ Y It, vor. - ni: This Te-- 
sponse “was elicited approximately 10 percent. of the time 
,, during the ‘three’ périods by both groups. Although. no 
. statistical ‘record. was: kept of thé, subject ‘matter circu-. 
` lated, dt was, clear: in- this case that. the: ownership of 
an: automobile, hi-fi’ Set, or television set was the reason 
v: behind’. search. for materials im A category in many 
instances. roe E CP 

` Summary. Ten eer di the circúlation. was for r reg- 
sons: not connected. with either research or class work, The 
. gfaduate-faeulty group “charged; out items in increasing 
amounts during the.seniester for class work ‘and because 
. the items were applicable to. their research program. "For. 
; this same group, ‘collateral inforthation to research or the 
major field of interest declined’ dramatically as & reason ` 
for selecting an item. as the semester progressed.’ Sixty- 
two poen of the" circulation of, the esr LA 
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group were for reasons TI cd nt. While i increasing nüm-: += 


. bers of the graduate-faculty groups obtained materials 
` for classroom work, the undergraduates gave. ‘this reason- 


fewer times toward the end of the. semester: 


2 , 
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"Y, VI, VII. Personal rd “This wag — 
‘the most important: reference. source for. both the, ‘under-. 
graduates and graduate-facilty ` ‘group. Interestingly, no 
. graduate OT faculty. member. and only four. of nineteen ,. 
, undergraduates gave ‘this Source. when. they “indicated : 
` reason/4. for selecting. an item. As the semester ádvanced, 
' “this verbal source became incressingly: important except’ - 
for items selected for. research. by the Dou s 
"group. The undergraduates, on the other. hand, indicated ~ 
ath at’ the beginning : of the gemester—two "out of five: 
- instances—discussions with members of their department `. 


` wore the means of :becoming acquainted with the item; - 


"by the last: period: the, ratio had: Tisen.‘to two out of three. ` 
. Two: reasons might. be given: the contact between in-i 
. dividuals: becomes more friendly Over. & “period . of time 


.80 that more discussions take place, and; asthe semester us 


, advances, the: «student becomes. more. familiar. ‘with’ thé 
‘subject matter he i is! ‘studying to. permit: him tó ‘ask ques” > 


> tions. about areas he does not ‘understand or he wants to. 5 
- investigate further. Urquhárt/s study shows.that verbal ' 


_Tecomméndation Tahked third. as a source of "reference 


constituting 16 percent, óf all circulated materials. "Simi- . = 


larly, Bernal, found this means to be the third 1 in rank as . 
a source of reference although the’ percentage was slightly ` 
Ep (14 percent). ` Herner’s figures ‘are not: directly”. 
comparable. 
" formation “of. the | group. ‘he interviewed: ‘were ' obtained ' 


through verbal sources. But of the gources of inforiátión * l 


. from, literature: that are “indirect sources of information,” 


the estimate of 19 percent -was given: for personal: recom-, 
“mendations as. 2, means for learning about materials. 

: However, “the scientists at J ohng Hopkins listed personal ^ 
" recommendations as-the most important.‘ indirect sgurce” 
— information: with ‘the School of Engineéting: faulty: 

. . tanking personal. recommendation às second... . *., 


Verbal recommendations from other than niembers: ot. 


^a department or, project were indicated relatively, few.’ 
__-times,. 7 pércent for. professional. contacts. and 3 percent- 


for contacts other than those connected with the patron’s 


| ‘work, Since there were so few responses for these two: 
“sources” and because they ` were 86 scattered, no particu- - ; 
lar trend or significance” statistically was” observable. ge 
However, in six of the ten instances, when'tlie- patron E 
indicated the personal. source of information was. someone - 
. .othér than a member of his department or project or one; 
. of his own profession; the sourcé was the librarian,’ 
^ VIU. Through advertisement, book noin or. 
. revieto.. This was clearly the least: important. of the” - 


sources of reference. No undergradusite gave this as a. 
source. | “Of the graduate-fáculty group, k source Of 
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The Engineering Library distributed only one accession 
list during this period. Herner found that the most ap- 
preciated library reference service was the publishing of 
accession and selected reading lists, but he does not give 
any statistics as to their relative use. The Science Mu- 
sgum Library survey showed accession lists to be the 
least important as a source of information. The Agri- 
culture librarians at the University of Wisconsin report 
a considerable response from the distribution of their 
“new book list.” They attribute this success to regular 
distribution of the list and the system of circulating the 
new books. Whether this source of reference is truly not 
considered a useful source by the patrons of the Engineer- 
ing Library, or whether other factors are involved, needs 
further study. 

: There is evidence that faculty members do read book 
reviews and publishers’ announcements because of their 
requests for purchase. Book announcements and: reviews 
were seemingly of little use to the John Hopkins’ scientists 
as they listed this source last. Neither of the British 
studies gave this as a possible source. 


IX. Browsing through the library collection. One of 


every four items was learned about through browsing. 


It ranked second to personal recommendation. For the 
graduate-faculty group this ratio was relatively con- 
sistent throughout the period under observation. The 
undergraduates, however, spent less and less time brows- 
ing toward the-end of the semester. Approximately- one 
out of four items found through browsing was used as a 
source of information for the major field of interest, al- 
though the graduate-faculty group indicated that 19 per- 


cent of the. materials they found in this manner were’ * 


applicable to their research. 

Since all the items charged out at the Science —— 
Library in Urquhart’s study were solicited by mail, 
browsing could not be considered as a source of reference. 
In only one of the questionnaires prepared by Scates and 
Yeomans (4) was this possibility for learning about 


* 


graphic materials brought out. The question was £o.. 


framed that it cannot be compared with the situation 
under study here. 

X, XI, XII. Through bibliographies. Only one under- 
&raduate indicated that his source of reference was a 
bibliography in a periodical. From counts made of in- 
dividuals examining current periodicals, few undergradu- 
ates spend time reading the periodicals available in the 
Engineering Library; consequently, it is not surprising 
that undergraduates did not give this as an important 
means for learning about literature. Four students gave 
their source of information as a bibliography in a book. 
Because of the wording of response 9, there is no way of 
knowing what type of bibliography the 12 students used, 
but from the information obtained from several inter- 
views, the bibliographies, in most instances, were those 
prepared by their instructors. These responses were too 
few and too scattered to show any trends. 

The graduate-faculty group indicated that their source 
of information was a bibliography in one out of four 


instances. Of the total 24 percent, 10 percent were from 
books, 12 percent from periodicals, and 2 percent from 
especially prepared bibliographies. Bibliographies found 
in periodicals became an increasingly important source 
of reference toward the end of the semester, especially 
for items applicable to the research or thesis of the 
patron. 

References cited in literature were the most frequent 
source of information for the individuals questioned in 
the British studies (37 percent by Bernal and 38 percent 
by Urquhart). The Johns Hopkins’ scientists gave one- 
third their votes to this means (14 percent for bibliog- 
raphies and 19 percent for books and papers). 

XIII. Subject heading in the card catalog. The under- 
graduates found one-fifth of their materials through the 
subject card catalog, and it was used almost entirely for 
finding material for classwork and for their research. It 
was consulted only a few times for finding general in- 
formation and only once when the patron was looking 
for material not connected with his school work. The use 
of the subject card catalog increased as the semester 
progressed. 

The graduate-faculty | group specified the subject card 
catalog as a source in only 9 percent of the cases. This 
total however is misleading: . during the first period 
16 percent of the class work items, 8 percent of the 
research items, and 22 percent of the items for general 
interest were found through the eard catalog. By the 
third period, however, only one person in this group 
admitted using the card catalog—and he used it to get 
allied information for his thesis. Apparently the more 
advanced the work of a graduate student or faculty mem- 
ber, the less he depends upon the subject card catalog. 

Herner found that the Johns Hopkins’ scientists used 


‘the card catalog in about the same percentage as the 


eraduate-faculty group did in this study. Subject card 
catalogs were not listed as a source of reference in the 
British studies. 

XIV. Indexes and abstracts. From the responses on 
the questionnaires, indexes and abstracts were used very 
little as & means of becoming acquainted with literature. 
However, an actual count of individuals using the ab- 
stracts and indexes indicates that they are used. as a 
reference source more frequently than this survey shows 
(6). Indexes and abstracts deal almost entirely with 
periodicals. Periodical references, when found through 
this means, are read in the library and only the especially 
useful or pertinent item is charged out. In all three of 
the studies previously mentioned, indexes and abstracts 
ranked near the top as the most useful reference source. 

Summary. Decidedly, verbal recommendation plays an 
important role in informing the patrons that there is 
literature of interest to them in the Engineering Library. 
Over one-third of the items charged out were recom- 
mended to the patron by someone. Over one-fourth of 
the items circulated were learned about through browsing 
in the library collection. Bibliographic references are a 
relatively unimportant device for the undergraduate, 
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whereas : one-fourth of 'the items — out — dh 
. graduate-faculty group. were discovered through: this - 
> means... Book reviews and accession lists do not geem to 
"bé influential in increasing the library's ‘circulation. . One- 
fifth of the items charged: out: -by the undergraduates were 
discovered through the subject card catalog. The subject . 
card catalog seems to play a relatively minor role forthe- . 
graduates and faculty, members as Ber work advances. 
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° Conclusions. 


— 


The dats — — this survey — pe used”. 
io make. generálizations for all librariés or even for col- 
| lege. engineering - libraries.’ Two, important, failings . — 
`, observable in: this method of survey: . (a) that only cer- 
` tain aspects of the total library. operation, were under ` ^ 
. observation; and (b). & statistical method, while: an im- + 
portant méans for interpretation of situations, ean never: ME 
convey individual differénces that are so important where. 
human preferences: are involved. “At best. a statistical &p-- 
^ proach to. problems: can “reveal “general trends” and in-. 
. dications of directions’ for- action. In this respect this 
' survey has: emphasized some well-known, but. often mot 
- considered facts about-library practice. ae 
E There were as many graduates and faculty members: 
"using the library’s non-re&erve collection, as undergradu- 
: ates although statistically they” aré a very. much emallér.. 
group i ih the College of Engineering.. | 
p The most important source of reference fo. materials ; 
e in, the library. was through. verbal recommendation: This `: 
has important: implications ‘for the librarian if he .is, to 
- take part in "information gathering habits” of his. pa- ` 
` trons. First, the. librarian must, be acquainted with the, ` 
subject fields of his patrons. This is perhaps obvious, but" 
it is not unimportant because it is obvious. A librarian 
cannot discuss the, literature “of ‘a ‘subject ‘field without: -. 
` knowing pompini about the — field. Second, the; 
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librarian int be aequainted with his patrons as in- 


dividuals. One ordinarily does not discuss even abstract 
and impersonal matters with a stranger. | 


< 


1 


3. Because: of the relative: importance of browsing in" 


the library, ihe physical’ arrangement of the’ materials un- 
doubtedly- is an important factor... In other words; ‘the | 
"availability" of ‘the’ oe collection su how. 


it will be used. 


4. Perhaps. more. aprii: than any,‘ eric 
- that has comes from: this survey Has been’ the personal 
." insight that was given to the librarians at the Engineering 
Library. with regard to their own work i in relation,to the 
_work of the staff and students of the College of Engineer- 
ing. It has. -brought to attention and emphasized „the 
interdependence of. téaching and research: dico function- 
_ ingot the library. ` eee ee Wl age 
‘References 7 = EE. u ges al ge — 


* 


à 


1. Her Nun 8.y Pisa icu habits of a in^ 


x pure and applied’ ácience, Industrial _ Engiteering +. 
- Chemistry, 40: 228-236 (1954): — š 


“naire. on the use’ of, scientific literature, in: The Royal : 
“Society Scientific Information Conference, 21 June- i 
.. July, 1948, Report, The. Royal pm London, 
::1948, pp. 589-637. 
3. , Ünqu sir, D. J; "The. distribution and. uge oF T 
'.. and. technical information, in The: ` Royal Society 


. Scientific Information Conference’ 21 June July, 
: 1948, Report, The Royal Society, London, 1948, pp. 


z 2. BERNAL, J: DX. Preliminary analysts ` oj — question- =- 


- 408-419. Reprinted from’ Jornal. of: Documentation; s. 


2 3: 222-231.. 
. 4, BoarES, D. FB. and. A. V... —— ‘Activities . of. Em... 
iployed: "Sckeritista ‘and Engineers Jor “Keeping: Cur- |. 
rently Informed im Their Fields f Work, BAM 
Couneil ón Education, Washington, D. C., 1950. 


` B. SCRUBRIN, ‘A.W: and C; A. ‘Scuaum, unpublished Ye | 


port, ur 31, 1956: 


- 
. ‘ y 1 
' + 


Letters to the. Editor 


Dear Sir: 


More concerning full names vs. initials as per letters to 
the editor [Amencan Documentation, January 1967] re: 
W. T. Brandhorst. My approach is, why worry? Establish 
a'standard similar to the military, 1.e., “sure everybody has 
a name, buddy, but here you are just. & number." So, what 
better identifier for Joe Author than his Social Security 
number? Editors wil not aecept for publication any docu- 
ment where the author has not included his SS#. (Hmm! 
better make that Soctal Security number). 'Those who want 
to file from now on, file by number, x-ref to a desk-top 
author-number list, and who cares how you sign your name | 

| 

DANIEL M. Simms 
Mobil Oil Corp. 
' Dallas, Tezas 


Dear Sir: 


In the paper presented by Barbara À. Montague to the 
1964 meeting of the ADI, it is claimed that three systems of 
indexing were compared, A and B being "co-ordinate 
indexes, ? and C a "classification index." This exercise seems 
to me to be one of the most carefully constructed and exe- 
cuted of any that I have seen in this field, and its results, I 
find, &re not only —— convincing but also, unlike 
many other such tests ER y 
formation retrieval in real li 


It is the more the pity, M that Miss yeso i | 
e- 


was not more explicit in describing her three systems, 

cause her conclusions are liable to give an utterly false 
impression. What made system A superior to system B were 
(1) its system of vocabulary control, (2) its ability to pro- 
vide for generic search, and (3) its superior use of roles. As 
to (2), I should like to emphasize Miss Montague's own 
point, that “the main factor responsible for irrelevance in 


system A occurred in one question for which selective ' 


generic class was not available, and the classes which had 
to be searched included concepts unrelated to the question” 
[my italics]. 

"Now system C is described only as a “subject index with 
one or two cross references on abstract cards.” This cor- 
responded to no “classification index” that I have evér 
heard of. A “classification index.” I should have thought, was 
the alphabetical list of terms, with their class numbers, that 
one finds at the end of the classification schedules, at least 
in all the schemes I know. Yet, in the conclusions, this ap- 
parently alphabetical list, so primitive that it has only, one 
or two cross references, "has become “a classification ByE- 
tem.” The next thing we shall have is the claim becoming 
accepted that Miss Montague has “proved” that co-ordinate 
indexing is superior to classification for information re- 
trieval. What, I should like to ask, is system A's ability to 
provide for generic search if it is not classification? What 
does Miss Montague imagine a faceted classification to be 
if it is not a controlled vocabulary, with — structure, 
with precise roles (these are the facets 
selective generic classes for all its terms? — co-ordinate 
index that departs from mere alphabetical listing of terms, 
and introduces generic structures and roles, has become a 
classification system. 

I should not wnte with such emphasis if I did not esti- 


mate highly the value of Miss Montague's work. I ES 


with her views on deep indexing and the use of roles; 
agree with her views on links, except that I wonder weder 


relevant to the task of in- ` 


and also with - 


a very selective use of links, by a skilled indexer—and 
“when in doubt, leave it out’—might not sometimes im- 
prove performance; and I would emphatically endorse the 
implicit conclusion that the system which costs more at in- 
put but less at output is the better. I should grieve to see a 
paper of this quality cited as evidence for claims which it 
actually proves to be false. ` 


` D. J. Fosxerr 
University of Ibadan 
Ibadan, Nigeria 


Dear Sir: 


. I appreciated the u to read Professor Foskett’s 
penetrating analysis of my paper, "Testing, Comparison, and 
Evaluation of Recall, Relevance, and Cost of Coordinate 
Indexing with Links and Roles, ?" which- appeared in the 
July 1965 issue of American Documentation. E 

Professor Foskett fears that readers of this paper may mis- 
interpret my statement that the coordinate index, System A, 
performed better than System C, which was & classification 
system, and extrapolate this to mean that coordinate index- 
ing is superior to all classification systems. Indeed, this was 
not my intention and, being aware of such potential misin- 
terpretation, I carefully worded my statement as follows in 
the Summary and Conclusions: 


The comparison of a coordinate index with a classification 
system shows that, for the two systems Lested, coordinate 

indexing provides faster searching and retrieves more 
relevant references, and that the cost of coordinate index- 
ing is higher at input and less for searching. 


The objective of the test reported in the paper was to 


evaluate the relative effectiveness of the two systems. Sys- 


tem C, primitive though it may have been, was the practical 
search tool actively used by the patent attorneys themselves 
at that time, and the success of its performance depended in 
large part on the familiarity in depth of the body of art by 
the users. It was ade from this test that the perform-° 
ance of the coordinate index System A justified the cost of 
input and was superior to the classification system in use 
at that time. 

It is obvious from Professor Foskett’s letter that we share 
common views in improving the understanding of a wide 
variety of approaches in use today for the storage and re- 
trieval of information. It is this writer's hope that oppor- 
tunity will arise in the future for personal discussion’ with 
Professor Foskett in these areas of mutual concern. 


BARBARA A. MONTAGUE 
Information Systems Division 
E. I. du Pont de Nemours & Co. 
Wilmington, Delaware 


x 


Dear Sir: 


“Computer” and “Boolean” were magic words for many 
in documentation during the fifties. Since then we have 
learned a good deal more about what electronics and sym- 
bolic logie can and cannot do for documentation. Now 
“behavioral” and “psychological” seem to have become 
magic words for some who write about documentation 
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problems. “As strikin example appeared: in the Americañ 2H psychology 88 A “behavioral science” may ‘take all. fuman 3 
Psychologist, Ne ovember 1966: a - n d nid behavior as its subject, But it does not follow. 
that psychological in ; can ‘answer, all questions ` “about”. 


zo. Many. of ‘the. missions in our society have technological — - quiry, as 
-. goals, such;as sénding..a manned’ space vehicle. to the ` ' that “subjeot-—all: “behavioral” questions. ; For instarice,' * 


‘driving an ‘automobile’ is behavior, But- psychological: rez. ` 
moon or. producing “e lcierit- thermonuclear. power. "The .'. 
. information science. mission; however, is tò facilitate..a ` z — ae ot ey eee Sore they | — 
peculiarly human. intellectual behavior; namely, research ` th P eagure their success” | 
itself. - Because “of this essential. difference, the criteria. e ough it çan Help in developing, such.instructións. ~ 3 
for evaluation of an- information System" cannot be speci ^. The point made-in: the closing four sentences of: the | pre- J 
- fied as technological. ones. They must be. specified . 'AS be zu Sore aph ig. generally ‘applicable. to attempts 39 — 
baviorsl ones; The development of criterion measures to.. apply e.b avira] sciénces in documentation. Mer e ge 
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<:  instrüet the: designers and operators of.information sye-' ^ .: 1. y . E OST ud Kd e aue e de 
' tems (whether journal ' editors, planners. OF COMPUT ow. care 
2 based reference-retrieval systems, «or! whómever) about ^ "m A d a Oe EMI LI 
` what they should: try-to accdmplish ‘and how to measure - — ma a ORG d DOMUI o Zx 
, their success thús’ becomes’ &- task for bsyehological r Mer te ae UC m D RE. "XS AE A E 
to search Que. 7 ee a i iue ME Mee E HL NOR SUR MR 


K 


` "a Uar g 


: l The last sentence of this iis pasiage ge is icone If it had x um . Panem, E. B. and W. J. Parry; ‘Research for Psycholó. . E 
B ended “becomes a task-which can be helped by psychological -` ^ gists. at the’ Interface of the Scientist and his Informa- : 


research," it would. be plausible. But if would also be plauz =~ > i a : 
sible if it had ended “becomes a task which -can be helped by ~ mr sexi System, American ' Psyéhologis o gist; 21: 1069 ; (1900). `. 


' history of science research” oor "philosophy. of scieńee re- 2 i t2 59 ole E. JouN O'CONNOR « ^. À L. 77 xe 
search," or. "documentation research, — “library science - ee fee T Center for the MUR Sciences. a4 
. research” eto. ni ° ° Rh. er cA ee Lehigh Umen s p ue EAT sa 
- The. sentence as it stands claims & central, and apparently "aL ee - CLE : E 


- self-sufficient, role for psychology i in developing criteria for - 

WM Kd. nota: "This iota is a by E the authors SE on informa: 
évaluation of information systems. The, paper. from which . ; Hes retrieval, which is- sponsored- by the: Intormation Systems -Branch,- 
the sentence is taken provides. no support for that claim. . ^: Qe of Naval Research (Contract: Nonr 2800014-66-00098).. Reproduc- 
‘Anyone who finds the quoted. se of "bc plausible should mote ` . tion in whole:or.in part — tor any purpose ot the United ° States ME, 
, in it the; ambiguous occurrence of“ Me ç ‘Specifically, [ox Governments J : E oe 
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Book Reviews 


3/87-1R . Data-Hetrieval by Computer: A Critical 
Survey. 1966. Asa Kasher. The Hebrew University, Jeru- 
salem, Israel. 76 pp. 


“critical”: 
captious; censorious 
' . Exercising, or involving, careful judg- 
ment; exact; nicely judicious 


This report defines “data retrieval” in a very unfamiliar 
but completely valid way, so as to encompass both data base 
systems and text processing systems. But it devotes most of 
its critical analysis to what have generally been called 

“question answering systems,” which makes the use of the 
term “data retrieval” particularly strange. That is, it is 
concerned with a critical evaluation of a number of com- 
puter programs which accept natural language sentences as 
input and from them generate logically acceptable output. 


Such systems involve a combination of problems in la e. 


data processing, file searching, and logical calculus. Table 1 

presents a chronological listing of those which were con- 

sidered in this report. For each, there is a brief indication 

of the approach which the system seems to have adopted in 
h of the three areas of analytical processing. 

"PT The prospective reader will find it advisable first to read 

“Answering English Questions by Computer, A Survey,” 


R. F, Simmons, Comm. ACM S (1): 53-70, Jan. 1965, since . 


much of this report not only refers to Simmons' survey, but 
apparently bas the intent of negating his evaluations as 
well. Specifically, the Kasher report raises three primary 


critical problems and a number of subsidiary ones, It claims ` 


that: 


1. None of the systems examined has an adequate — 


for resolving linguistic ambiguity—of syntax, me ur Ge 
context—even though the descriptive reporis about 
typically imply differently. 


2. None of the systems has an adequate decision method 
for handling logical consistency in the input data, since 
they all ignore some basic theoretical problems in logic. 


3. None of the systems has an adequate definition of 


what constitute “questions” and “answers” since at best ` 


they represent explications of specific types of questions 
(or answers) on specific subjects. 


` Traditionally, a critical review is “critical” in sense 2. 


above. This one is more than critical, even in sense 1; at is 
pejorative, sprinkled liberally with phrases such as “me- 
thodological errors .. 
serious flaws which indicate its fallaciousness . . . claims 
which attempt to disregard problems. . devoid. of prac- 
ticality in an essential way . . ignoring Een cenis 
theoretical results .. . results are almost trivial... ab- 
surdity of his claim." 

The view of the report as a whole is summed up on pages 
66 and 67 as follows: “There are those who consider that 

. question-answering deals only trivially with a trivial 
sub-set of English. ... The faults are far more serious, in 
that they stem from grave difficulties of principle.” On the 
basis of its critical analysis, the report concludes that 
. the only hope for success in the near future is in 

well-structured data-based systems, having a special internal 
structure appropriate to a specific field, a reliable technical 
language, and a competent inference-mechanism, the latter 
taking account of the special internal structure and based, 
as far as possible, on non-classical calculi." 

The report's conclusion seems eminently reasonable, but 
one wonders in what way the question-answering systems 


1. Inclined to eriticiro, esp. unfavorably; . 


. widespread misconception... - 


which are attacked in fact depart from it, As the report’s 


analysis, demonstrates, there is no essential difference be- 
tween the “data-based” systems and text-based ones (under 
the proviso, which the report so strangely adopts, of English 
language input to each); each of the systems indeed em- 
bodies a special internal structure, presumably appropriate 
to its specific field; each utilizes a specific technical lan va 
which, while it tries to approximate English, presumabl 
reliable ; and most of them include an inference — 
usually & variant of the predicate calculus, each presumably 
based on its special internal structure (although its “com- 
petence" may be subject to question). Why then the need 
to attack what is essentially a straw man—namely, that the 
designers of these systems have the hope and desire of 
making a positive step toward a formalized means of 
handling natural language? Admittedly, the descriptions of 
desires have become an almost ubiqwtous part of what 
should simply be reports of results. But the informed 
reader of these reports has by this time surely learned to 
filter thém out and evaluate the actual results for what they 
are—operating models, more or less illuminating as realiza- 
tions of prospective theories, which also provide the means 
for applying those theories to a relatively large number of 
examples. The pity is that someone so capable of “critical” 
review did not also-attempt to define and evaluate the 
postiive contributions which each of the projects he ana- 
lyzed has made to the goal which the report itself defines 
as the “hope of success.” 

As a step toward such an-evaluation, the techniques each 
system uses can be roughly classified into the three cate- 
gories defined previously—language data processing, file 
searching, and data reduction (including logical — 
Because it is in fact the combination of problems (and 
techniques for solving them) which the systems have tried 
to handle, it seems oie worthwhile to evaluate the effective 
contributions of each. To this end, it would be useful to 
analyze each of the systems in terms of the following suc- 
cessive areas of complexity, and perhaps quantify them by 
the size of the tables involved in each project: 


l. The length of strings which can be accepted, as a 
group. for analysis. 

2. The vocabulary, quantified by the number of pre- 
stored terms the program handles. 

3. The semantie ambiguity, quantified by the average 
ee of separate meanings a representative term may 

ave. 

4, The syntactic ambiguity, quantified by the average 
number of syntactic roles for terms. 

5. The richness of syntactic patterns, quantified by the 
number of defined sentence patterns. 

6. The magnitude of the retrieval task, quantified by the 
number of stored sentences. 

7. The complexity of the measurement of degree of 
match, quantified by the number of alternations of logical 
operators. 

8. The complexity of the logical analysis, quantified by 
the number of stored sentences which can be simul- 
‘taneously considered. 


The characterization of the complex tasks of language 
analysis and logical inference in such simple terms as “size 
of stored tables" is admittedly a gross over-simplification, 
which certainly doesn’t even begin to recognize the theo- 
retical problems involved. However, as Kasher in his report 
rightly points out, at best the systems analyzed are con- 
cerned with the development of “technique” and do not 
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really provide any real insight into the theory of language 
or logic. The issue then is the power or effectiveness and 
this can be measured by the size of task which can be en- 
compassed, particularly if this can be related to the 
processing times and equipment costs. 


Rosert M. HAYES 
Institute of Iabrary Research 
! University of California 


3/67-2R National Library of Medicine Current Cata- 
log. Volume 1. Jan. 1-14, 1966. US. Public Health Ser- 
vice, Washington, D. C. Biweekly, cumulated quarterly 
from the first of the current year; annual cumulation case- 
bound. Supersedes National I4brary of Medicine Catalog. 
Sold by the Superintendent of Documents: 1966 price, 
315.00 a year ($20.00 foreign) with annual cumulation also 
sold separately for $4.50; price increase to be announced. 


Current Catalog is not likely to hold still Jong enough for 
description in the timeless or at least timely terms reviewers 


would like to apply to books. Having launched it as a 


somewhat tentative venture in using new techniques to 
meet old goals (its machine system is called an “interim 
module"), its mentors not only have had their own plans 
from the start for further development, but also have been 
actively seeking users’ criticisms of both its purposes and its 
performance with the implication that these responses will 
affect its future. 

Preparation for Current Catalog began during the period 
when Dr. Frank B. Rogers led the National Library of 
Medicine beyond & past of proud distinction to revolu- 
tionary advances in organizational status, physical facilities, 
and bibliographie services, which culminated in the remark- 
able production of the Index Medicus as one output of a 
computerized Medical Literature Analysis and Retrieval 
System (MEDLARS). Many people have guided the de- 
velopment of Current Catalog, some now gone from NLM 
(Samuel Lazerow, then Chief, Technical Services Division, 
and Irvin Weiss, a systems analyst), some still there (Scott 
Adams, Deputy Director), and others recently arrived (Dr. 
Martin M. Cummings, Director since 1964). Most directly 
responsible now are James P. Riley, Chief, Technical Ser- 
vices Division, and Emilie V. Wiggins, Head, Catalog Sec- 
tion. 

A printed catalog of the books in the National Library of 
Medicine is not new. One of the earliest was 2 454-page 
volume issued in 1872, followed by a three-volume edition 
in 1873-74. (The “first catalogue” was an 1840 manuscript 
“containing 23 unnumbered leaves and listing 130 titles,” 
not printed, however, until a facsimile edition was published 
in 1961). The monumental /ndez-Caítalogue of the [abrary 
of the Surgeon General's Office, whose sixty-one volumes 
issued in five series from 1880 to 1961 record on their -title 
pages the evolution of official names from “Library of the 
Surgeon General's Office, U. S. Army,” through “Army 
Medical Library” and “Armed Forces Medical Library,” to 
the present Congressional designation as “National Library 
of Medicine” (under the Public Health Service), contained 
author and subject entries for periodical articles. Before 
the decision was implemented to finish listing those books 
with imprints through 1950 by issuing the three volumes of 
Series 5 (1959-61), the Library began publishing as a new 
work, supplementary to the Library of Congress Catalog, 
an annual volume recording its book cataloging of the pre- 
vious year: an April-December 1948 volume and a 1949 
volume were issued before the regular annual series of the 
National Labrary x Medicine Catalog (earlier titles: Army 
Medical Inbrary Catalog and Armed Forces Medical ILa- 
brary Catalog) began in 1950, with superseding five-year 
cumulations following for 1950-54 and 1955-59, and a six- 


year cumulation for 1960-65 soon to appear as an end to : 


that series. In addition, from 1960 through 1965 & monthly 
list of catalog main entries for selected U. S. books and 
periodical titles appeared without cumulation at the end 
of each Index Medicus issue, entitled “Recent United 
States Publications.” 

The Current Catalog continues the record of NLM book 
cataloging from January 1966. As an NLM published cata- 


log, it is new in that it.attempts rapid reporting of catalog- 
ing data through biweekly issues, cumulated each quarter 
(“Cumulative Listing") for the entire current year to date, 
with a bound cumulation to appear annually and perhaps 
larger cumulations to follow later (although no announce- 
ment has been made on the possibility of larger cumula- 
tions). The most similar existing service is The Nattonal 
Unton Catalog, which the Library of Congress issues 
monthly with quarterly cumulations, but which does not 
rovide the same degree of currency in reporting or cumu- 
ating as Current Catalog promises, nor the same complete 
listing of added entries. One may assume, further, that 
neither NUC nor any other published book catalog covers 
biomedical books as comprehensively as does Current Cata- 


log. 

. Current Catalog is new also in that its data are first 
machine encoded, next manipulated by a computer to 
alphabetize them under all desired entries with varying 
parts of the data appearing under the several types of entry, 
then composed by the computer in column and page for- 
mat, and, md by GRACE (Graphic Arts Composing 
Equipment), N Ms computer-driven, high speed Photon 
900, reproduced automatically in high quality type fonts 
on photographic negatives ready for offset printing. The 
process, similar to that by which Index Medicus and other 
recurring bibliographies are prepared, thus combines speed 
of editorial assembly and page composition with the read- 
able appearance of traditional book type (upper and 
lower case, boldface and roman, serifs, and diacritical 
marks), although the space-saving six-point type is still a 
frustration to tired eyes. In addition, the machinable records 
have. the potential for such varied new uses as computer 
compilation of demand bibliographies and distribution via 
punched cards, punched paper tape, or magnetic tape to 
other libraries for their use in producing catalog cards or in 
computer searching. 

At the moment, however, NLM does not have programs 
ready for selective searching of the book cataloging data 
as it does for MEDLARS periodical indexing data (a 
cataloging record can be accessed only by citation num- 
ber to make corrections), and it has not merged any of 
these book entries with its MEDLARS data file (pre-1964 
plang called for beginning with a combined file, but the 


- problems were too great). 


The catalog entries in Current Catalog are the result of 
NLM's traditional book cataloging practices, which pres- 
ently are not accompanied by subject cataloging in depth 


“(assigning more than the average two or three subject 


headings in order to identify specific and multiple subject 
aspects useful in computer searching). Although the printed 
Current Catalog would not itself reflect such additional 


"subject cataloging, the project surely must attempt some- 


day to benefit from the opportunities which machine search- 
ing presents for more effective storage and retrieval of 
information from books than traditional subject cataloging 
allows. The theory and procedure of subject cataloging 
books in depth are full of problems, however, which require 
more than machines or increased manpower to solve. 
The uses and stated purposes of Current Catalog are 
several: i$ is a comprehensive announcement bst from 
which other libraries may select new acquisitions; it is a 
source of cataloging data to assist cataloging efforts in 
other libraries: it is a permanent reference tool for manual 
searching of the literature under names, titles, and subjects. 
Current Catalog's attempts to publish cataloging data in 
time for use in acquisitions and cataloging activities of 
other libraries so far have been only partially successful. 
NLM has arranged to receive advance copies of domestic 
publications and by a tour of Europe by one of its staff 
members has made similar arrangements with European 
publishers, but these arrangements have not yet (January 
1967) resulted in the rapid cataloging desired. Several 
libraries have reported recent comparisons of current im- 
prints listed in their published accessions lists with those in 
all issues of Current Catalog through the corresponding 
date; the comparisons showed a large proportion of rele- 
vant new titles not listed in Current Catalog. While it is 
an excellent complementary selection tool, especially for 
government and foreign books and for Public Health Ser- 


American Documentation — July 1967 189 


vice and other government contractors’ biomedical reports, 
it cannot yet be used as a medical library’s primary selection 
source. 

Delays have resulted at other points in the Current Cata- 
log production ‘process in addition to that of cataloging 
input. Most obvious is the delay in distribution after 
' GRACE has readied the page negatives for the nongovern- 
ment printer, who usually gets the printed issues to the 
Government Printing Office mailers within four days. While 
each 1966 biweekly issue was intended to reach subscribers 
on the final “coverage” date then printed on the issue, this 
reviewer's library in common with others regularly Was 
receiving its issues over two weeks later. The U. 8. mail 
might be a bottleneck, but the GPO mail room seems the 
culprit of choice. NLM and some of the subscribers have 


recently toyed with the idea of sending advance photocopies. 


as an experiment in reducing the delivery time, but no mass 
solution is yet indicated. 

In addition, a proposed distribution of mechinable cards 
or tapes prior to the printing of each issue may speed 
delivery to those libraries able to use such records. (A 
breakdown of GRACE in December and January dela The 
puo hotographic copy of the first 1967 issue for eight days. 

ibrary announced that it would arrange for back-up equip- 
ment to prevent similar future delays.) Speed in production 
is perhaps most notable in issuing the cumulations: the 
first annual eumulation is expected at this writing to be in 
the hands of the mailers by the first week in: February 
— sooner than the former’ annual catalog d 
to be. 

An early change was made in page content to assist visual 
scanning for selection purposes, as well as to save space. At 
first, the full citation (e.g. author, title, edition, place 
publisher, date, collation, series, other notes, tracings, 
number, an in-house citation number, and price, when 
known) was reprinted from author through date plus the 
call number and citation number under all added entries 
(joint authors, editors, title, series, etc.), but seme 
with the third ‘quarter of 1966 isgue of July 2-14 the adde 
headings were entered as “see” references to the main entry 
heading without reprinting of any other part of the citation 
except the citation number (and in 1967 the citation number 
also will be omitted from these cross-references, so that a 
single cross-reference might suffice to cover more than one 
title or version). The resulting short block of print, it has 
been. suggested, will signal added entries which can be 
skipped when scanning for selection purposes, although some 
few main entries are themselves equally short and must be 
watched for. The main entry heading, whether in the main 
entry itself or in a cross-reference, is in boldface as an 
added aid to scanning. but six-point boldface and roman 
type can very quickly blur into the indistinguishable. 

The biweekly issues, 20 by 25.7 cm. in size and in two- 
column format, include main and added entries plus subject 
entries for persons or corporate bodies for imprints of the 
current or two preceding years, but not topical subject 
entries (although full tracings are shown under each main 
entry). Each biweekly issue has at the back & separate list 
of “Added Volumes” newly received and an alphabetical 
“Directory of Publishers” of books with their addresses 
and a sublisting in citation-number order of addresses of 
publishers of serials, all very useful for acquisitions pur- 
poses. Newly cataloged earlier imprints (except pre-1801 
imprints and Americana) and, in a separate “Subject Sec- 
tion,” full citations except for tracings and price under all 
topical subject headings (which NLM with some exceptions 
does not assign to books over twenty-five years old) are 
added to the quarterly cumulations (the pages are larger— 
23.5 by 295 cm.—and hold three colum). 'The quarterlies 
further differ from biweeklies in including cross-references 
from variant forms of names. The January-September 1068 
quarterly, third for the year, contained & Subject Section of 
374 pages and & Name (and title) Section of 457 pages. 

Recent NLM discussions with subscribers have involved 
A a that topical subject entries would be useful for 

tion and reference purposes even in the biweekly issues, 
and, according to the National Library of Medicine News 
21 (12): 4-5, Dec. 1966, a subject listing will be included 
in the biweekly issues, to be enlarged to the same three- 
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column format as the quarterlies, beginning January 2, 
1967 (the date of the issue will then be the closing date for 
corrections and changes), with an accompanying price 
increase to be announced. Some subseribers have suggested 
adding tracings for name cross-references and_ including 
these in the biweekly issues for catalogers’ use. Some have 
proposed adding union catalog holdings information, a 
separate supplement of new serial titles, and Library of 
Congress subject headings and classification numbers. 

The new Anglo-American cataloging code E ap wired 

being followed more i NLM than it will be by LO (sin 
1966, NLM has been using & draft copy of — new 

ode, although it plans to wait for future machine assistance 
before making certain — changes, such as dropping 
^U. S." at the beginning of every entry for the National 
Library of Medicine). It would be highly useful if the sev- 
eral national distributors of centralized or shared cataloging | 
could more consistently accept and apply rules of entry 
and description. It is one thing to ensure finding of a book 
by putting multiple entrics and cross-references In a card, 
printed, or computerized catalog, but it is not satisfactory 
to have a variety of possible main entries when only one 
entry is to be listed in selective bibliographies, in a library’s 
acquisitions files, or on temporary catalog slips. (A curious 
example of variation in main entry is & comparison of 
New Serial Titles corporate main entry, “US. National 
Library of Medicine. Current catalog” with NLM's title 
main entry, “National Library of Medicine current cata- 
log.” The former follows past ALA rules, while the latter 
demonstrates NLM's preference for a title main. entry 
when a corporate name comes first in the title but ies 
have to be altered to follow ALA rules (in this case by ad 
ing "U. S," although the Anglo-American code will — l 
this particular ^U. S.”). Otherwise, NLM too has followed 
ALA rules for corporate entry of periodicals. (It so hap- 
pens that the title-page form of title entry for periodicals 
18 usually the more useful to readers familiar only with the 
citations used in indexing and abstracting services and in 
the periodical literature itself, and one might conclude that 
the new Anglo-American code, which retains the corporate 
entry rule for periodical titles like Journal of the American 
Medical Assoctation, does not wholly suffice as a basis of 
agreement, although it will be the best code libraries have 
ever had.) Complete consistency in main entry is impos- 
sible, needless to say, but a concerted attempt by major 
libraries to accept a specific main entry for a specific book 
or serial might be found useful upon reevaluation of the 
problems which are supposed to prevent such unanimity. 
The results would assist readers in discovering whether a 
particular book is held by a library, as well as reduce the 
expensive adaptations of shared cataloging which go on in 
library after library. 

The National Library of Medicine has been seeking sug- 
gestions not only from subscribers to Current Catalog, but 
also from independent systems analysts (the Auerbach 
Corporation), who are helping to design improvements in 


' many aspects of NLM’s data storage and retrieval activities, 


including MEDLARS, the Current Catalog, acquisitions 
and serials records, and ways of handling graphic images. 
Current Catalog may be expected to change even more as 
new methods are found to meet old and new goals at a 
price the smaller libraries can afford to pay (and the 
original price was an outright bargain). 

Current Catalog is an essential part of the bibliographic 
apparatus, which every library seriously interested in 
medicine or the biomedical sciences must have and use. 


STANLEY D. TRUELSON, Jn. 
Yale Medical Library 


3/67-3R ` Scientific Management of Library Opera- 
tions. 1966. Richard M. Dougherty and Fred J. Heinritz. 
The Scarecrow Press, Inc., New York, 252 pp. 


It is a pleasure to commend this volume to the readers of 
American Documentation, As a long-time admirer of the 
work of Ralph Shaw, I had come in recent years to think 
that the down-to-earth, practical aspects of work analysis in 
relation to library objectives were being neglected in favor 


of the more sophisticated “systems-analysis” approach asso- 
slated with automation. It was a double pleasure, therefore, 
to find this handbook type of presentation, effectively com- 
oming general statements of principle with precise advice 
and directions on how to go about making management 
studies. The volume has the further advantage of a 
'udicious list of references for further study, including such 
:lassics in the field as those by Frederick W. Taylor and 
4yndall Urwick, as well as those as recent as 1964. The 
vork of some 258 pages, including index and numerous 
charts and illustrations, has thirteen chapters. Key word 
yr phrase subheadings, both in the text and in the table of 
contents, facilitate quick reference and orientation to what 
s to follow. It is possible, accordingly, for the prospective 
iser of the volume to determine almost at a glance what 
night be relevant to a particular problem. 

The chapter headings will in themselves give something 
of the flavor of the book: I. Scientific Management: What 
+ Is and Is Not; II. Making a Management Study; III. 
[The Flow Process Chart, Flow Diagram and Block Dia- 
ram; IV. Decision Flow Charting; V. Operations Analysis 
including Some Principles of Motion Economy; VI. Forms: 
Their Analysis, Control and Design; VII. Time Study; 
VIII. Sampling; IX; Aids to Computation; X. Cost; XI. 
Performance Standards and Control; XII. Study of a Cir- 


ulation System—The Present Method; XIII. Study of 


Areulation System— The Proposed Method. 

Moving from these relatively general topical headings, 
he text itself is presented m a direct and lively style, but 
at the same time with the kind of restraint that is all the 
nore persuasive because one feels that he is bemg talked 
vith rather than being talked at. Examples of this approach 
yan be found on almost every page. Examples: on page 16: 
'Libraries share with the multitude of other governmental 
ind public service organizations in any community the 
‘esponsibility for giving the taxpayer a maximum return of 
ieryice for each dollar invested." A little later (page 17), 
‘In addition to improving routine efficiency, management is 
v useful tool of library personnel management and financial 
idministration. Work analysis is the key to modern job 
lassifieation. Only when we have ascertained of what the 
ob consists, and what level of productivity we may rea- 
¡onably expect of the person performing it, are we able to 


lefine intelligently what sort of innate ability and special 


‘raining are necessary for its performance.” 

The rhetorical question and its variations are used 
airly frequently, but again, for this type of volume, ef- 
'ectively. On page 17 for example, “This argument against 
she efficacy of scientific management in libraries disregards 
he very substantial part of library work that consists of 
petitive, mechanical routines that lend themselves readily 
© quantitative analysis. In terms of total hours required 
‘or performance, the largest bulk of library work—perhaps 


as much as 70 to 90% of all current library tasks—consists 
of such routines.” 

The practical approach is stressed throughout. Under 
the heading “Selecting an Area for Study,” one finds the 
subheading, “Frequently Performed Jobs,” and then in the 
text, “Since the time, money, or energy are not available to 
study everything and everybody, our efforts must be con- 
centrated upon areas that are likely to yield the highest 
return for our study investment. The more frequently an 
operation is performed, the better a candidate it is for 
analysis. The reason for this is that oven if we are able 
through improvements to save only a small amount of 
time each time the operation is performed, this saving 
multiplied by the high frequency makes the total time saved 
substantial.” Although the statement would be obvious 
and self-evident to the experienced person, it is precisely 
this kind of information which should be part of the educa- 
tion of all librarians, whether or not they themselves are 
to be engaged in this type of management work. 

A further convenience of the volume is the extent to 
which background information necessary in applying these 
techniques to library operations is included. In the chapter 
on sampling, for example, one finds: “The closer to 100% 
certainty that an investigator demands that bis sample 
approach, the larger it will have to be. He must therefore 
begin by making two decisions as to how reliable an answer 
he needs or desires. The most common practice is to use 
a 95% confidence level. This means that the sampler can 
be confident that his random observations will represent 
the facts 95% of the time. It also means that 5% of the 
time they will not. . . . The most common confidence level 
is 99%. Since & 99% confidence level is seldom necessary in 
management practice, and since the 4% increase in certain 
may require a substantial increase in sample size, it 18 
recommended that it be used sparingly.” 

The examples given above illustrate another characteristic 
of the presentation throughout the volume, namely, the 
self-confidence with which the advice, conclusions, and 
recommendations are given. While omission of qualifying 
phrases may result in an oversimplification of the problems 
inherent in making management studies, the direct, con- 
cise, and lucid text is to be preferred for the purposes for 
which the volume was written. 

While librarians and others who have been following the 
literature of this field through the years will find little that 
is new, Daugherty and Heinritz have made a real contribu- 
tion to the literature of librarianship by bringing so much 
information from past and current practice together in one 
convenient volume. 


Ricuarp H. Loaspon 
Director ; . 
Columbia Universtiy Iabrantes 
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AMERICAN DOCUMENTATION 


INSTRUCTIONS: TO AUTHORS 


American Documentation is a publication of the Amen- 
can Documentation Institute. It is a scholarly journal in the 
various fields in documentation and serves as a forum for 
discussion and experimentation, Papers already published or 
in press elsewhere are not acceptable. For each proposed 
contribution, one original and two copies (in English only) 
should be mailed to Mr. Arthur W. Elias, Editor, Amer- 
can Documentation, Institute for Scientific Information, 
325 Chestnut St, Philadelphia, Pennsylvania 19106, The 
manuscript should be mailed flat in a suitable-sized , en- 
velope. Graphic materials should be submitted with suitable 
cardboard backing.. 


Tyras or MANUSCRIPTS: Three types of contributions are 
considered for publication: full-length articles, brief com- 
munications of 1,000 words or Jess, and letters to the editor. 
Letters and brief communications can generally be pub- 
lished sooner than full-length manuscripts. Books, mono- 
graphs, and reports are accepted for critical review. Two 
copies should be addressed to the Review Editor, Dr. 
T. Hines, 54 North Drive, East Brunswick, New Jersey. 


Processina: Acknowledgment will be made of receipt of 
all manuscripts. American Documentation employs a re- 


viewing procedure in which all mansucripts are sent to two . 


referees for comment. When both referees have replied, 
copies of their comments are sent to authors with the 
Editor's decision as to acceptability. The refereeing pro- 
cedure requires about 30 days. Authors receive galley proofs 
with a five-day allowance for corrections. Standard proof- 
reading marks should be employed. Reprint order forms are 
forwarded with galleys. 

Format: All contributions should be typewritten on white 
bond paper on one side only, leaving about 1.25 inches (or 
3 em) of space around all margins of standard, letter-size 
(8.5 >x 11 inch) paper. Double spacing must be used through- 
out, including the title page, tables, legends, and references. 
The first page of the manuscript should carry both the first 
and last- names of all authors, the institutions or organiza- 
Bons with whieh the authors are affiliated, and notation as 
to which author should receive the galleys for proofreading. 
All succeeding pages should carry the last name of the first 


author in the upper right-hand corner (0.5 inch from the 


top) and the number of the page. 


SryLE: In general, style should follow the nie given in 
the Style Manual for Biological Journals (SMBJ), published 
for the Conference of Biological Editors by the American 
Institute of Biological Seiences (1964). 


Tr: The title should be as brief, specific, and — 
tive as possible. Vague and unrevealing titles may delay 
publieation. 


ABSTRACT: An informative abstract of 200 words or less 
must be included, typed with double spacing on a separate 
gheet. This abstract should present the scope of the work, 
methods, results, and conclusions. 


ACKNOWLEDGMENTS: Financial support may be listed as 
a footnote to the title. Credit for materials and technical 
assistance or advice may be cited in a section headed 
“Acknowledgments,” which should appear at the end of 
the text. General use of footnotes in the text should be 
avoided. 


GRAPHIC MATERIALS: American Documentaiton requires 
finished artwork. Follow the style in current issues for lay- 
“out and type faces in tables and figures, A table or figure 
should be constructed so as to be completely intelligible 
without further reference to the text. Lengthy tabulations 
of essentially similar data should be avoided. 

Figures should be lettered in black India ink. Charts 


3 


drawn in India ink should be so executed throughout, with 


Bo typewritten material included. Letters and numbers ap- 


pearing in figures should be distinct and large enough so 
that no character will be less than 2 mm high after reduc- 


. tion, A line 04 mm wide reproduces satisfactorily when 


reduced by one-half. Graphs, charts, and photographs should 
be given consecutive figure numbers as they will appear in 
the text; however, figure numbers and legends should not 
appear as part of the figure, but should be typed double 
spaced on a separate sheet of paper. Each figure should be 
marked lightly on the back with the figure number, author's 
name, complete address, and shortened title of the paper. 

For figures, the originals with two clearly legible repro- 
ductions (to be sent to referees) should accompany the 
manuscript. In the case of photographs, three glossy prints 
are required, preferably 8 X 10 inches. 


ORGANIZATION: In general, papers should dista the back- 
ground and purpose of the study, followed by details of 
methods, materials, procedures, and equipment. Findings, 
discussion, and conclusions should appear in that order. 
Appendixes may be employed where appropriate for ex- 
tensive lists, statistics, and other supporting data. 

BIBLIOORAPHY: Accuracy and adequacy of the references 
are the responsibility of the author. Therefore, literature 
cited should be checked carefully with the original publica- 
tions. References to personal letters, abstracts of verbal 
reports, and other unedited material may be included. If 


, an as-yet-unpublished paper would be helpful in the evalua- 


tion of a manuscript, it is advisable to make a copy of it 
available to the Editor. When a manuscript is one of a 
series of papers, the preceding member of the series should 
be included in literature cited. 

CITATION FORMAT: 

Order: Literature cited should be sequentially numbered 
as cited. | 

Authors: Give all authors with arrangement as follows: 

Elias, A. W., B. H. Weil, and I. D. Welt 


Tiles: Give full titles of articles in English, indicating 
language of original as: (In Ger.) 


Journals: Journal titles should be given in full. 


MONOGRAPH AND SERIAL Data: Should.be presented in 
order as follows: Volume, issue number, pagination, and 
year. The issue number should be given in parentheses if 
journal pagination is not continuous from issue to issue. 
Pagination should be inclusive. Year.of publication should 
be given in parentheses. An example is given below: 

Bishop, D., A. L. Milner, and F. W. Roper, Publication 
Patterns of Scientific Serials, American Documentation, 
16 (No. 2): 118-21 (1965). 

American Documentation is published in January, April, 
July, and October, One copy is included in the individual 


. membership fee ($20.00 per year), three copies in the con- 


tributing membership fee ($100.00 per year), and up to five 
copies in the sustaining membership fee ($500.00 per year). 
Nonmembers may subscribe at $18.50 per year, postpaid in 
the US. Single copies may be purchased for $4.65 ench. 
Communications concerning memberships, subscriptions, re- 
prints, renewals, back issues, advertising, and changes of 
address should be sent to the American Documentation 
Institute, 2000 P Street, NW, Washington, D. C. 20036. 
American Documentation is indexed in Library Litera- 
ture, Current Contents of Space, Electronic & Physical 
Sciences, Library Science Abstracts, Sctence Citation Indez, 
Chemical Abstracts, and Documentation Abstracia. 


American Documentation is entered for second class mail- 
ing at Baltimore, Maryland. 
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|General Model of Information Trans ae: 
Theme Paper 1968 Annual Convention 


A general model of information transfer establishes a 
conceptual framework for contributed papers for 
the 1968 ADI Convention in Columbus, Ohio, October 


20-24, 1968. The general model is an elaboration - 


on the classic sender/channel/receiver model and 
presents a variety of alternative channels for informa- 
tion transfer including direct transfer, primary re- 
corded media, archives, secondary recorded media, 










* Prologue 
Information Transfer! !—That’s the theme of the 1968 


be held in Columbus, Ohio, October 20-24, 1968. 

The technical committee of the convention believes that 
lost authors reporting on information work and research 
believe that their efforts will in some way improve the 





common interest to give a special coherence to both the 
nvention and the published proceedings. The plan is 
to establish a conceptual structure for the technical pro- 
am in the form of a general model presented in the 
following “theme paper” on information transfer. It is 
conceived that authors will be able to respond within 
their own specific areas to the broad structure established 
the general model. To foster this process, the theme 
jiper presents the general model and poses questions 
about many of the specific problem areas contained there- 
in. Its purpose is to promote thought and response 
in the form of contributed papers which will provide 
e backbone of the technical program and be logically 
interrelated by the structure of the general model. Each 
contributing author will be requested to introduce his 








odel and his specific subject area. Papers highly cor- 


nvention of the American Documentation Institute to ` 


transfer of information. The committee plans to use this 


aper with a description of the correlation between the ` 


and information centers. Suggested areas for response 
in the form of contributed papers include costs, per- 
formance, benefits, functions, application of scientific 
and technical disciplines, research, vocabulary con- 
trol, and language processing associated with in- 
formation systems, science, and technology. A call 
for papers for the 1968 ADI. Convention is included. 


JOHN W. MURDOCK and 
DAVID M. LISTON, JR. 


Battelle Memorial Institute 
Columbus Laboratories 
Columbus, Ohio 


related with the theme will become the convention con- 
tributed papers. Other papers of high quality judged to 
be of interest to ADI members’ will provide the content 
for the author forums. Thus, the following theme paper 
heralds the call for papers for the 1968 ADI Conven- 
tion. Specific suggestions appearing throughout the text 
for responding papers are set in italics to bring them 
obviously to the reader’s attention. 


* Introduction 


Inherent in at least one set of definitions of the 
words “knowledge” and “information” is the concept 
that an item of knowledge becomes an item of informa- 
tion when it is “set in motion'— when it enters the 
active process of being communicated or transferred 
from one or more persons, groups, or organizations 
(sender) to one or more other persons, groups, or or- 
ganizations (receiver). Many people will argue that 
knowledge as defined here has no intrinsic value—that 
only when it is successfully transferred is its value to be 
realized. Others go further, arguing that the value of 
information cannot be realized until it is actively ap- 
plied in decision making. Hither of these viewpoints 
must necessarily contede that value is dependent upon 


transfer. Thus, information transfer is an important and 
appropriate theme for the 1968 American Documentation 
Institute Convention. ‘This theme paper presents a: gen- 
eralized model of information transfer to set the stage 
for the convention’s technical program. The initial call 
for papers is included as the final section. Some persons 
may wish to respond to the call for papers by exploring 
the idea of value being dependent on transfer. 


* The General Model 


Figure 1 presents graphically the general model of 
information transfer. It is immediately obvious that the 
model is based on the classic sender channel receiver 
concept. In this case, there is a variety of alternative 
channels. : 


Tx» VARIETY OF CHANNELS 
AND THE COMMUNICATION CONTINUUM 


Communication between sender and receiver can occur 
at a number of levels along what is referred to as the 
“communication continuum.” This also was called the 
“feedback dimension” by Lawrence Berul (1). The au- 
thors believe the general model in Fig. 1 includes every 
type of communication channel for information transfer. 
The value of the model is in the possible orientation or 
perspective that it provides for authors to say “Here is 
where my specialty helps in the information transfer.” 
For example, in the situation of an individual who writes 
himself a note, the note is the primary recorded medium 
and his file of notes (or desk top or drawer) is the 
archive. He becomes the user when he wishes to retrieve 
the note. Sophistication: is added when several people 
prepare reports or write memos and the archives become 
a central file. Further, complexity is added when the 
media include reports from outside the organization 
such as published literature. The archives now comprise 
a library or its equivalent. It is possible similarly to 
relate other information work to the model. 

The Direct Channel. One extreme of the communica- 
tion continuum (included in the direct, nonrecorded 
transfer channel of the model) is face-to-face discussion 
in which communication 18: 


1. Very direct. 
2. Very. dynamic, permitting the utilization of: 


* words, phrases, sentences, etc. (language) ; 

. gesticulations; 

o inflections of the voice; 

'* interruptability, allowing the receiver to inter- 
rupt the sender requesting clarification of or 
elaboration on the message being spoken; 

* feedback, allowing the receiver to become the 
sender with reverse flow of information transfer; 


3. Very rapid, with virtually no delay time involved. 
Disadvantages primarily relate to: 


1. Faulty mes Š 
2. Little chance for study of what is transferred; 
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3. Frequent acceptability of vague generalizations 
which would not be — in & recorded 
message. 

Progressing from the point of face-to-face discussion 
along the communication continuum toward situations 
involving less directness, less dynamic transfer, and more 
time delay, one can visualize situations such as phone 
conversations, television broadcasting, and radio broad- 
casting. All of these types of transfer are signified by 
the direct channel from the originator to user DEM 
in the general model. 

The Primary Recorded Media Channel. Eivestually 
the point is reached where the originator feels that what 
he has to say should be recorded as part of the body 
of literature of his discipline. This publication is usually 
thought of as the primary literature dealing with current 
topics. Until the past 5 years, little was done to package 
primary literature for retrospective searching other than ' 
providing periodic indexes. Probably much more could | 
be done to make it readily retrievable. Jt is hoped that 
someone will consider writing on this subject in response 
to this paper. Other examples of primary recorded 
media are letters, newspapers, conference notes, technical 
reports, handbooks, monographs, texts, patents, and 
tapes. Each of these media is worthy of papers on 
information transfer. 

The Archival Channel. Because the user is not always 
sensitized to the flow of messages through the more 
current channels, the archival channel has developed to 
store information for subsequent delayed usage when the 
user becomes aware of a need for it. Document depots, 


libraries, special libraries, and corporate files are all 


forms of archival storage. Conttnued reporting of re- 


‘search on improvement of archives is hoped for as mput 


to the 1968 Convention. ` 

The Secondary Recorded Media Channel. The next 
channel for the transfer of information involves the 
secondary sources or media. It feeds from both the 
primary recorded media and archives and also becomes 
archival when collected into libraries and other holdings. 
The purpose of the secondary recorded media channel 
is to assist people to search, more easily, an ever increas- 
ing volume of current and stored: information for items 
of interest. Secondary media.such ag abstract journals, 
accessions bulleting, indexes, and bibliographies are faced 
with increasing volumes of literature and with pressure 
to reduce the time period for funnelling information 
from the other channels. into the secondary media chan- 
nel. This has increased costs sufficiently to make people 
question whether value received is worth the cost. This 
controversy could lead to many interesting papers. 

The Information Center Channels.. Information cen- 
ters have increased in importance in the past 10 years. 
They represent an attempt to provide a service to es- 
sentially a known group of users upon demand. The 
information analysis center, in particular, attempts to 
utilize all information transfer channels to provide tech- 
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nical answers to technical questions posed by users. 
Thinking in terms of an electrical analogy to the model, 
information centers act as “switching centers” utilizing 
the “circuitry” of the channels in the most appropriate 
combination of series and parallel arrangements. 

The concept of analysis centers has been applied pri- 
marily to technical disciplines and mission-oriented 
projects. Reports of applications of the analysis center 
concept. to the social, political, and economic fields would 
be of interest for the 1968 Convention. The functions 
and services of analysis centers were first deseribed by 
G. S. Simpson (2) at the 1961 Boston ADI Annual Meet- 
ing. The symbol used in Fig. 1 to represent information 
centers was presented at that time and has been used 
in several conferences and papers since 1961. The three 
parallel segments of the symbol represent the primary 
functions of the analysis center as described by Simpson. 
The top segment represents the acquisition function; the 
middle segment represents the storage and retrieval func- 
tion; and the bottom segment represents the primary 
function, analysis. In analysis centers as much as 80% 
of the budget is spent for the analysis of information by 
experts. The Special Interest Group of ADI on Analysis 
Centers is another recognition of the analysis center as 
an established activity in information transfer. Dr. 
Chalmers Sherwin said at the National Symposium on 
“Putting Information Retrieval to Work in the Office” 
on May 9, 1967, and in & paper (3) discussed at that 
meeting that he felt that the analysis center concept 
would provide the answer to the national information 
problem for at least the next-generation. This statement 
might prompt some responses which would be of interest 
at the 1968 Convention. 

The more often used expression, “information center," 
also has as its main characteristic the response to a 
customer on demand. However, the information center 
ig distinguished from the information analysis center 
primarily by the lesser degree of analysis performed. 
Information centers respond to inquiries more specifically 
than libraries. For example, information centers often 
repackage information and often publish the new pack- 
age. The primary functions of information centers are 
acquisition, storage/retrieval, and direct responses to 
customer’s requests resulting in some publishing of 
special reports. Many hardware and system designers 
have worked on problems associated with improving 
information centers. Papers on all facets of methods 
and mechanisms to improve the operations of informa- 
tion centers are encouraged by the Committee. 


CyocLic NATURE OF TRANSFER FROM 
ORIGINATOR TO USER 


In & gross sense at least, the entire information trans- 
fer model is cyclic in that users (as a group) are the 
same people, sensors, or machines as the originators (as 
& group). Even an individual has the problem of com- 
munieating with himself across the time span of present 
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to future. This problem is especially important to the 
individual as he promises himself to return eventually 
to &n item observed in the current literature which 
cannot be read currently for any of a number of reasons. 
In & generalized model of information systems, M. C. 
Yovits and R. L. Ernst of The Ohio State University also 
depict a cyclic flow. (4) In the Yovits/Ernst model 
(Fig. 2) the decision making function is analogous to 
the originator/user elementa of the model i Fig. 1. The 
types of originators/users represented in the Fig. 1 model 
include: ` 

e Individual people 

e Individual sensorá 

e Individual machines 


e Industrial corporations 
e Not-for-profits 


e Nonprofits 

e Universities  . 

e Professional societies 
e Federal Government 
e State Government 


RELEASE RESTRICTIONS 


Regardless of the type of channel utilized in trans- 
ferring a message, there are certain release restrictions 
which wil impede the “free” transfer of information 
from originator to user. Returning to our electrical 
analogy, these release restrictions would be much like 
resistances or impedances in the circuits connecting the 
originators to the users. Furthermore, the total resistance 
to flow would probably vary according to whether the 
resistances in the channels were applied in series or in 
parallel or in combinations of both. 
` Everyone seems willing to grant that release restric- 
tions are real phenomena. Even at the level of face-to- 
face communications, they exist in such forms as lan- 
guage difficulties, personal reluctances to divulge facts, 
and personal incapabilities of expression. Release re- 
strictions become more noticeable as the contact be- 
tween sender and receiver becomes progressively less 
direct—less face-to-face. One often does not write in a 
letter or say on the phone what he would say face-to- 
face. Thus, even though the release restrictions are not 
overtly applied, there is tacit adoption of restrictions 
as the contact between sender and receiver becomes 
more remote. But, there is much we do not under- 
stand about this impedance: l 


1. What is the magnitude of the impedance? 
What percentage of valuable information is not 


Extencal l External 





Fia. 2. Generalized information system model (4) 


available to certain people because of security 
classifications, for example 

2. How critical is the impedance? To what extent 

does it really impair progress and understanding? 

3. What possibilities are there for reducing or com- 

AE: ensating for the impedance? 

How justifiable are these impedances in view of the 

value of information—or do they exist because of 
the value of information? 


Improved insight on these and related topics would be 
very worthwhtle. Consideration might be given to the 
following different levels of restrictions (5): 
D Unclassified /Public Domain, (2) Unclassified / 
pyrighted, (3) Personally Confidential, (4) Proprie- 
tanh (5) Security Classified, (6) Natural Language Dis- 
oreqanoy (7) Personal limitations in written or verbal 
expression, (8) Expense (costs). 





* Some Specific Areas for Response 


ith the general model of information transfer serving 

as the underlying logical structure, a great many areas 
are made available for consideration. This section of the 
theme paper attempts to provide some preliminary in- 
roads to some of these subject areas with the objective 
of promoting development of a full spectrum of papers 
on specific topics within this general framework. Such 
papers will be the heart of the technical program of the 
1968 ADI Convention. The following discussions are not 
ai nec to inform but rather to prompt thought and 

= apes response. Authors may wish to discuss con- 
cep! that can apply at any point or combination of 
points in the transfer spectrum. A host of ideas for 
sas is inherent in our previous presentation of the 
general model of information. Some areas worthy of 
ad tional gpecific mention are: 

©) Cost/Performance/Benefit Interrelationships 

è| Functions Performed Within the Channels 

e Scientific and Technical Disciplines Involved in In- 

formation Science and Technology 
e Current Areas of Research 


e| Vocabulary Control/Language Processing 
e Optimum Channel Utilization. 


Cosr/PERFORMANCE/BENEFIT INTERRELATIONSHIPS 


As a sounding board for further discussion, we offer the 
following hypotheses concerning the interrelationships 
between costs, performance, and benefits of information 
systems. The term, costs, in this discussion simply refers 
to the costs involved in operating an information system. 
However, the clear definition of the other two terms is 
more critical to a clear understanding of the soloing 
discussion. 

Performance. The term, performance, comprises the 
combination of five factors: 


| 1. Coverage—the extent to which an information 
system covers all applicable information. There is 
TS specified a theoretical finite portion of the 


| 


total field of information which applies to the scope 
and mission of a system. Performance includes a 
measure of the completeness of coverage of that portion 
of information. 

2. Usage—the extent to which the system serves all 
the information needs existing within its scope and 
mission. There is inherently defined a theoretical finite 
portion of the total need for information which is 
able to be satisfied by the system. Performance in- 
cludes a measure of the completeness of satisfaction 
of that portion of the total information need. 

3. Accuracy—the degree of perfection with which 
the system can fit applicable information to specific ex- 
pressions of need. This factor involves the familiar 
measures of relevance and recall. 

4. Speed—the speed with which the system can 
perform its functions. 

5. Output Quality—quality of products and/or ser- 
vices offered to the system users. 


Benefits. The term “benefits,” 18 expressible i in terms 
guch as: 


1. The extent to which all inadvertent duplication of 
effort can be prevented. 

2. The extent to which the planning and decision- 
making functions of any organization can be 
improved. 

3. The extent to which synthesis of new ideas can 
be fostered through the manipulation and ob- 
servation of information contained in an informa- 
tion system. . 


From the above definitions we see that performance 
of a system is a function of factors internal to, or con- 
trollable by, the system. This is in contrast to the factors 


- bearing on benefits. These are external to or beyond the 


control of the information gystem. 
Cost-Performance Relationship. Figure 3 depicts the 
relationship between cost and performance. Our hypoth- 


Cost of Operating the Information System 


a 


100 
Percentage of Maximum Possible Performance Level 


Fig, 3. Cost-performance relationship 
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Benefits Yieided Through Operation of the information System 


Percentage of Maximum Possible Performance Level 


Fia. 4. Benefit-performance relationship 


esis defines two basic characteristics of the interrela- 
tionship: 

1. At zero performance level, the cost of operating 
the system is also zero. 

2. As the performance level approaches 100%, the 
cost of operating the system approaches in- 
finity. 

Benefit-Performance Relationship. "The relationship 
between benefits and performance is shown in Fig. 4. 
As the performance level increases, there wil be a 
diminishing inerement of benefit to be derived from each 
additional increment of performance—a tendency to ap- 
proach a point of diminishing returns. 

Benefit-Cost-Performance Relationship. Figure 5 pre- 






Benefit to Cost Ratio 


SN 


Percentage of Maximum Possible Performance Level 


Fia. 5. Benefit-cost-performance relationship 
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sents the relationship between all three variables by 
plotting the benefit to cost ratio against performance. 
The benefit to cost ratio is similar in concept to return 
on investment. The shaded area represents conditions 
under which no information system should operate, be- 
cause in this area it always costs more to operate the 
system than can be derived from it in the form of bene- 
fits. Curve C depicts an information system in & situation 
where there is no level of performance at which it can 
operate to produce a positive return on investment. Such 
a system would be completely unjustifiable. To operate 
optimally would be to operate at that level of perform- 
ance (Point B) at which the system achieves the maxi- 
mum benefit to cost ratio (Point X). 

Cost-Performance Optimization. In Fig. 6, if Curve A 
represents the cost-performance relationship of an exist- 
ing system, attempts at improving the system design 
toward optimum conditions (or designing the optimum 
system) can be represented as trying to “dent-in” the 
curve to arrive at a curve more like Curve B. This 
“denting-in” of the cost/performance curve can be ac- 
complished by: 1. Devising ways to decrease costs with- 
out decreasing performance (as in moving from Point 1 
to Point 2) in Fig. 6. 2. Devising ways to unprove 
performanee without increasing costs (as in moving from 
Point 1 to Point 3) 3. Devising improvements which 
combine Items (1) and (2), above (as in moving from 
Point 1 to Point 4). | 

Benefit-Cost-Performance Optimization. In Fig. 7, if 
Curve A represents the relationships for an existing sys- 
tem, then attempts at improving the system toward 
optimum conditions can be represented as trying to in- 
crease the value of the maximum benefit to cost ratio, 
regardless of the performance level at which the maxi- 
mum ratio would occur. Examples of such improvements 


Cost of Operating the Information System 





Percentage of Maximum Possible Performance Level 


Fic. 6. Cost-performance optimization 





Benefit to Cost Ratlo 





Percentage of Maximum Possible Performance Level 


Fia. 7. Benefit-cost-performance optimization 





are depicted in the figure as increasing the maximum 
ratio from Point 1 to any of the Points 2, 3, or 4. 
the prime difficulty with making the above hypotheses 
a Working tool is the elusive nature of the measurability 
of the factors involved. Take, for example, the cost 
factors. It seems a straightforward problem to measure 
osts of an information system. However, if the concept 
of l'system" is extended (as it probably should be) to 
include the users and their costs of "doing business," the 
measurement of costs becomes very difficult. In that 


case the cost of not operating a system is not zero 


because the costs to the user of not having a system 


would have to be accounted for. The curve in Fig. 3 
might, instead, be “U” shaped. Additional measurement 
problems include: 


1. How do you measure the parameters of perform- 
ance, Coverage, usage, accuracy, speed, and 
quality of products? 

2. How are benefits to be detected if they occur 
externally to the system? 

3. How are benefits to be measured if they can be 
. detected? 


Papers on cost/effectiveness are heartily encouraged. 
Convineing management to spend inereasing sums of 
money for information systems will become increasingly 
dificult without means for tangible dollar justification. 


FUNCTIONS PERFORMED WITHIN THE CHANNELS 


Within each channel, there is a variety of functions 
performed to make the channel operative. The com- 
parison offered in Table 1 seems to indicate a fairly 
high degree of agreement between Wall (6), Simpson 
(7), and Berul (1) in identifying the nature of these 


‘functions. A much more generalized expression of func- 


tions was suggested by Ben-Ami Lipetz in a lecture 
before an ADI seminar in Columbus, Ohio, early in 
1967. He offered the view that all of these functions 
can be categorized into three general types: (1) Matching 
of records, (2) Movement or physical displacement of 
records, (3) Creation of new records from old records. 
All the aspects of system functions including those served 


Tania 1. Functions performed within information transfer channels ` 


Wall (6) Simpson (?) 
Acquisition Acquisition 
| Surrogation Abstract preparation and 
dissemination 
Announcement Accession list preparation 
and dissemination 
Index operation Index preparation and 
i dissemination 
Document management Storage 
Retrieval 
Dissemination 
Correlation Bibliography preparation 
Answering technical 
questions 
Analytical studies 
Vocabulary control Reference searching 


Referral services 


Berul (1) - Lipetz 
(Origination) Physical comparison of 
Acquisition records (matching) 
Surrogation 

e Cataloging 
e Abstracting 
e Indexing 
Announcement Movement or physical 
: displacement of 
Index operation records 


Document management, 
Retrieval 
Dissemination 


Creation of new records 
from old records 


` 
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Tabu 2. Scientific and technical disciplines involved in information science and technology 


Type of Endeavor 


. Theory Development. This field involves efforts toward 
building theory under the wide variety of practices that 
have empirically developed as a result of the pressing neces- 
sity for operating information systems. 


. System Design. Research in this field would be directed to 
` making the design of information systems a systematic 
process. 


. Human Replacement. The intellectual effort by humans 
continues to be the major cost factor in information transfer 
systems. This field encompasses all of the efforts to develop 
automatic techniques to replace human intellectual processes. 


. Language Accommodation. This field covers all sorte of 
techniques and devices required to accommodate the fact 
that languages are very inexact—and to make information 
transfer systems work in spite of that fact. 


. System Operation. Research in this field would encompass 
all efforts toward improved efficiency and effectiveness of 
the operation of information systems. 


. Philosophical Development. Efforts in this field would be 
directed specifically at the frontiers of information science— 
thought transmission, programmed learning, bionic applica- 
tions. For example, an “information transfer philosopher” 
might ask such provocative questions as “Isn’t it possible 
that the techniques of reading and writing are becoming 
obsolete as information transfer techniques?” Literally any 
and all disciplines will likely come into play in exploring the 
philosophical frontiers. 


. Economics. Emphasis in this field would be on the cost/ 
benefit aspects of information transfer. 


. Language Redesign. This field is directed toward the’ evolu- 
tion of an exact language to serve at least as the system 
language of information transfer systems and, perhaps, for 
extended use by authors and in other aspects of scientific 
communication. 


. Human Factors. Information transfer systems will remain 
man-machine systems for many years to come despite efforts 
in Field (3) above. This field will encompass efforts to im- 
prove the understanding and efficiency of the human aspects 
of and contributions to information handling. 
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an hardware and software continue to provide im- 
portant areas for research and development and thus, 
fruitful topics for discussion within the framework of 
information transfer for the 1968 ADI Convention. 


SCIENTIFIC AND TECHNICAL DISCIPLINES INVOLVED 
| INFORMATION SCIENCE AND TECHNOLOGY 


| The variety of efforts involved in research, develop- 
ment, technical services, consulüng, and operations con- 
cerned with information transfer require inputs from a 
umber of different scientific disciplines. As examples, 
ine areas of consideration serve to illustrate the diverse 
Hisciplinary contributions needed to attack the prob- 
lems. Table 2 presents nine areas of endeavor and their 
lassocinted disciplines. Papers discussing any of the myriad 
[porte of the applications of scientific and technical dis- 
ciplines to the problems of information transfer would 
lbe valuable contribtuions. 
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From many corners are heard comments deploring 
the wide gap between research efforts and practice in 
the field of information science and technology. Many 
people find it very difficult to foresee how the products 
of current research in the field will find their way into 
real-life applications of a practical nature. For such 
people, any efforts to close the breach between researchers 
and practitioners would be welcome contributions. Fig- 
ure 8 presents an example of such an effort. It attempts 
to show how techniques such as: 


e Automatic Abstracting 

9 Automatic Indexing 

e Character (pattern) recognition 
e Machine translation 

e Automatic speech analysis 


(presently in various phases of research) may fit into 
the fundamental document handling function of proces- 
sing documents to produce indexes. Figure 8 also in- 
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| Fic. 8. Current research applications 
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dicates those techniques which are now operational, 
those which are nearing practical application, and those 
which have more or less “blue-sky” status at the present 
time. Not covered by Fig. 8 are all of the various types 
of research methods which will be contributory to pro- 
ducing workable techniques of these types. For the 1968 
ADI Convention, papers on current research will be 
very much tn order especially in two areas: (1) Papers 
presenting specific current research efforts; (2) Papers 
correlating such research efforts with eventual practical 
application as exemplified by Fig. 8. 


VocABULARY ConTROL/LANGUAGE PROCESSING 


All of the channels illustrated in our general model of 
information transfer are troubled with language dif- 
ficulties. The language of the items of information 
entering any of the channels is not likely to provide a 
high level of similarity to the user’s language to which 
the output from the channel must attempt to respond. 
Thus, there is usually a translation problem between 
input to and output from any of the channels. 

At its worst, the translation problem will involve the 
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conversion from one natural language to another. How- 
ever, even when channel input and output are expressed 
in the same natural language, the correct matching of 
input and output ideas is plagued by a number of 
language problems: 


* Semanties—the problem of word meanings includ- 
ing both synonyms (groups of words all having 
the same meaning) and homographs (single words 
each having more than one meaning) 

* Generics—the problem of hierarchical word families 

* Viewpoint—the problem of varying contexts as a 
result, of varying viewpoints 

e Term preconjunction—exemplified by the choice be- 
tween the separate terms FLOW and RATE or 
the preconjoined term FLOW RATE as means for 
indexing a concept. 


These language problems, it is claimed, produce ad- 
verse effects on the recall/relevance characteristics of 
information systems unless properly controlled. In many 
systems, the means of control has been the intellectually 
produced thesaurus. Rules for the intellectual con- 
struction of thesauri have been published by the Engi- 
neers Joint Council (8). Figure 9 depicts the parallel 
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Fria. 9. The interactions between vocabulary control and the input/output elements of an information storage and retrieval 
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system. 


American Documentation — October 1967 


nature of the relationships between vocabulary control 

functions and the input and output functions of a typical 

information storage and retrieval system. In essence, 

the thesaurus creates a “system language” which is 

ca able of translating or “understanding” both the 

i guage of the input items and the language of the 
ra which is required for efficient output. 

Kos ts the expensive process of intellectual thesaurus 

truction really necessary for obtaining good system 
5 rformance? The second phase of the Cranfield Project 
6) provides some evidence (and it is possible we may 
be oversimplifying our interpretation of the results) that 
the simpler the indexing language, the better the recall / 
relevance performance of a system as shown in Fig. 10. 
Ifi the Cranfield results may be extrapolated to apply 
nerally to all information systems, the need for elabo- 
rate thesauri may evaporate. 

The term “language processing" seeris to represent a 
much broader scope of consideration than the concept of 
vocabulary control discussed above. Robert F. Simmons 
(10), in the 1966 Annual Review of Information Science 
apd Technology, organized his discussion of automated 

| guage processing as follows: 


* Computational Linguistics 
(1) Linguistic Theory 
| (2) Semantie Theory 
| (3) Psycholinguistics 
(4) Automated Syntactic Analysis Systems 
© Applications Studies 
| (1) Mechanical Translation 
(2) Automated Question Answering 
(3) Stylistic and Content Analysis 








We feel that contributions tn these areas and other 
dreas dealing with language will provide many of the 
fundamental stepping stones to future improved methods 
for expressing ideas and concepts, for converting such 

ressions into storable/manipulable form, and for 
dnalyzing, and correlating elements of mformation and 

us synthesizing them into new usable intelligence. 
hese are functions which provide the underlying frame- 
ork for improved information transfer. 


Recal! 





Relevance 


Fra. 10. Recall-relevance-performance characteristics 


OPTIMAL CHANNEL UTILIZATION 


Figure 1 illustrates well that the user of information 
may have several options available to him when he has 
the need to obtain information. His choice may be 
limited by his resources or those of his organization. 
Often, however, the options are limited by the lack of 
awareness of the individual or his organization of the 
options available. There is also the possibility that the 
individual or his organization desires to improve the 
availability of mformation but hesitates to invest the 
capital into the development of this capability because 
of uncertainties in the value of the results to be obtained 
or in the choice of what system is best. 

For example, most organizations when choosing to 
supply their members with assistance often establish 
libraries plus several services or specialized activities in 
addition to the library. Assume that, for dealing with 
published (or report) literature, an organization decides 
to provide additional specialized services to its members. 
The library, to meet this requirement, usually will pro- 
cure hardware or services to deal with the published 
literature in an overall sense, such as classifying journals 
instead of articles in journals. To provide in-depth in- 
dexing, the library will likely increase its subscription to 
commercially available secondary journals and indexes. 
When a member of the organization develops a need 
beyond the commercially available services, then spe- 
cialized storage and retrieval mechanisms are procured. 
In many cases, members of special programs and projects 
with extensive information requirements develop their 
own systems. In other cases, the management of the 
organization will authorize the development of large- 
scale mechanized information programs using computers, 
microlmaging services, or other mechanisms. The mul- 
tiple channels that may be used, and the variation of ap- 
proaches within each channel, coupled with the inability 
to show (in quantative terms) return on investment, 
pose some interesting questions on optimizing channel 
utilization. The problem of choosing the optimum means 
within each channel is also a serious systems study. The 
committee encourages the preparation of papers on the 
problem of optimal channel selection and the associated 
problems of choice within channels. 


° Epilogue—Call for Papers 


This paper has set the theme and procedure of the 
1968 ADI Annual Meeting in Columbus, Ohio, October 
20-24, 1968. Those persons who intend to submit papers 
should notify David M. Liston, Jr., Battelle Memorial 
Institute, 505 King Avenue, Columbus, Ohio 43201 of 
their intent by March 1, 1968. It would be helpful if 
the subject of the intended paper could be given at this ` 
time and if possible the specific area of the general model 
of information transfer to which it will relate. A guide 
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for authors wil be sent to these persons immediately 
upon receipt of the notification of intent. Manuscripts 
must be received by D. M. Liston by May 1, 1968. 
Each person submitting a paper will.be notified by July 
1, 1968 whether his paper has been accepted. 
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Water Resources Thesaurus" 


This paper describes one method by which a 
thesaurus has been developed making extensive use 
of a computer to supplement the intellectual effort. 


lu computer techniques incorporate several excep- 


— 


| 
| 
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Introduction 


Since the publication of the ASTIA Thesaurus of 
escriptors in 1960 (1), the general trend among the 
large information facilities has been toward more rigid 
controls of their subject indexing vocabularies (2). This 
trend was given added emphasis with the publication of 
e Engineers Joint Council’s Thesaurus of Engineering 
erms (8) in 1964. The conventions and format of this 


tter publication have recently been adopted, with minor . 


exceptions, by the Department of Defense (4). 
In view of this widespread trend toward the employ- 
ent of thesauri conforming to commonly accepted con- 
| entions and format, it is believed that the recent experi- 
nee of the Science Information Exchange (SIE) of the 
lthsonian Institution in developing such a thesaurus 
will be a timely contribution to the state-of-the-art. 
This paper is in no way concerned with the thesaurus 
pa a functional document, there being many advantages 





land disadvantages to this and other systems of indexing 
and retrieval. Its purpose is to review the recent develop- 
ment of a thesaurus for water resources research. Par- 
ticular emphasis in this paper is given to the novel 
approach to thesaurus construction in which the intel- 
lectual contribution was manipulated by the computer 
ito produce a thesaurus similar to EJC and LEX con- 
ventions and format. 

| The present work began in the fall of 1965 under a 
| contract between the Science Information Exchange and 
‘the Office of Water Resources Research (OWRR) of 
| 


* This work was supported by the Office of Water Resources Research, 
Department of the interior, Contract 7£14-01-0001-720 and carried out 
under the senior author who served as project director. 

t Respectively, Science Information Excbenge, Smithsonian Institu- 
tion, Washington, D.C. 20086; and ARIES Corporation, Westgate Re- 
search Park, McLean, Virginie 22101. 


omputer Usage in the Development of a 


tionally useful innovations not previously disclosed” 
in the open literature. The thesaurus which was de- 
veloped is similar in convention and format to that 
recently adopted for Project LEX (4). 


DAVID F. HERSEY and WILLIAM HAMMOND t+ 


the Department of the Interior, and resulted ultimately 
in a publication, Water Resources Thesaurus (6). The 
project was undertaken at the request of the OWRR to 
develop a word list which might prove useful to them 
and others working in the field of water resources re- 
search. Thus, it was essential that the selection and 
display of terms in a water resources thesaurus be 
compatible with the generally accepted usage of the 
research terminology in that field, as well as methods of 
indexing (6) (7) currently in use in the cataloging’ of 
water resources research. 


e Methodology—General 


The development of the thesaurus was the work of 
a large number of people. There were four general cate- 
gories of participants: the SIE staff scientists; the 
lexicographic consultants from Battelle Memorial Insti- 
tute; general scientific and information specialist-type 
consultants; and SIE and ARIES Corporation computer 
programmers and consultants who participated in the 
computer aspects of the project. The principal thrust 
in the present paper involves the detailed description 
of the effort of the latter participants. It is necessary 
to devote some space to a brief description of the intel- 
lectual effort which preceded the computer manipula- 
tions so that the actual role of computer can be more 
clearly visualized. 

A preliminary vocabulary for the thesaurus was derived 
from the current SIE word lists (6) and from terms 
used in the indexing of projects in the Water Resources 
Research Catalog (8) (9). To these were added words 
obtained from government and nongovernment contnbu- 
tors who were asked to supply additional terms of value 
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to individuals in their own fields of specialized interest, 
but using terms of general interest to workers in the 
broad area of water resources research. The real chal- 
lenge here, of course, was to take water resources research, 
which contains a myriad of multidisciplinary aspects, 


and to develop a word list which would be helpful to - 


users at various levels of sophistication. The final selec- 
tion of main terms, and the determination of the “the- 
saurus relationships" among those terms, was made by 
& limited number of individuals who carried the burden 
of exereising the necessary intellectual judgment. "These 
individuals, however, were completely conversant with 
the subject area—and the intended use—for which the 
thesaurus was being developed. 

As each term was selected, a lead-term work card 
was prepared as shown in Fig. l. Each lead term was 
then keypunched according to the instructions contained 
in Table 1 and the example illustrated in Fig. 2. Upon 
completion of this task, the punched cards were sorted 
alphabetically and printed out by computer. A seven- 
digit numeric lead-term code was assigned manually, 
after leaving a gap for imsertion of additional terms. 
This code was then entered on the term work card and 
keypunched into the lead-term punched ecards. The 
punched cards were then sorted into letter-by-letter 
sequence (utilizing the numeric sequéncing codes) and a 
second print out was made. Subterms, together with 
their numeric sequencing codes, were then added to the 
term work cards. The subterms included other terms 
in the vocabulary determined to be broader than (BT), 


TERM 
A ATIC ANIMALS 


UF A ATIC INVERTEBRATES 


NT FISH 
BLUE GILLS 
})LL FISH 
MINNGWS 
OMMERCIAL FISH 
BASS 
ARINE Fis 
lULLET 
SHAD 
SALMON 
SPORT Fis 


I 


HL . ^ 


narrower than (NT), related to (RT), or use references 
(USE) for respective lead terms. The subterm entries 
were keypunched according to the instructions contained 
in Table 1, sorted together with the lead-term punched 
cards into letter-by-letter sequence and a preliminary 
edition of the thesaurus was printed out on the computer 
for review and edit. 


* Computer Role—General 


The computer served a twofold purpose. It was used 
to maintain the integrity of the intellectual decisions 
reached concerning sequence coding, spelling, and im- 
mediate generic relationships among the terms. The 
computer was also used to generate the implicit generics 
among the terms throughout the thesaurus and to display 
these relationships for intellectual review. 

One set of programs was written to edit the preliminary 
thesaurus compilation to insure that the initial corpus 
was in compliance with the given conventions and speci- 
fications. A second computer application was designed 
and a set of programs was written to provide a com- 
puterized capability for updating and maintaining the 
thesaurus on magnetic tape and for reproduction copy. 

Essentially, the computer performed the following 
functions: 

1. Edit for data format. 


2. Edit for consistency in spelling and sequence coding. 
3. Generate direct reciprocals for all subterm entries. 


conc DEO 


| Br ANIMALS 


RT BENTHOS 
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other cards I 
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Fa. 1. Lead term work card 
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Fia. 2. Keypunched Jead term 


is Generate “generic trees” for each of the main term 
entries—i.e., when a broad subterm (BT) is listed, all 
terms broader than it are also listed as BT subterms 
for the given main entry; when a narrower (NT) is 
listed, all terms narrower than it are also listed as 
NT subterms for the given main term entry.. 

5. Eliminate duplication and conflicting thesaurus rela- 
oa among the subterm set for a given main term 
entry. 

6. Tag all terms in the file that are not of the lowest 
generic level, ie. terms having narrower subterms 
listed in the thesaurus. l 

7. Generate major subject category groupings for the 
terms on the basis of the generic tree structures dis- 
played in the thesaurus. 


Once the initial thesaurus edit has been completed and 

the conventions and specifications satisfied, new terms 
can be added or deleted, with assurance that the integrity 
of the "thesaurus relationships" among the terms in the 
vocabulary will be maintained. 

Terms can be deleted on matching sequence code. The 
main entry record together with all its subterms, and all 
references to the term throughout the thesaurus can be 
deleted as a single maintenance action. A new term to 
be added must carry its immediate BT’s or its immediate 

's and its RT's (immediate BI’s are most desirable 
from the computer “logistics” viewpoint since more than 
one card would seldom be required to construct the 
generic tree for a new entry). f 

[For the new term entry, the computer will automati- 


cally form the direct reciprocal and then generate the 


implicit generics (generic “trees”) among the other terms 
in the thesaurus as well as the new terms being added. 
In the course of the maintenance run, the computer will 
eliminate conflicting thesaurus relationships among the 
subterm entries and will adjust the generic tag throughout 
the thesaurus to conform to any change in the generic 


E 
I 


| 


status among the terms resulting from the update opera- 
tion. | | 


* Specific Application—Thesaurus Construction 


The thesaurus construction (cleanup) programs were 
originally intended for one-time use. They were designed 
for processing only a single step at a time as work pro- 
gressed on the initial compilation of the Water Resources 
Thesaurus. In the actual operation, however, a pre- 
liminary edition of the thesaurus, as discussed earlier, 
had been compiled and converted to punched cards before 
the computer application was written. For this reason, 
it appeared desirable to retain the existing punched card 
format for subsequent computer applications, This card 
format carries the terms and their numeric sequence 
codes, a numeric and alpha coding for the term relation- 
ship (BT, NT, etc.) and a numeric code for sequentially 
arranging successive scope note lines—all according to 
the instructions contained in Table 1. 

The numeric sequence codes were intellectually derived 
to provide letter-by-letter sequencing except for preced- 
ing numerics, which were ignored. Gaps were retained 
in the numeric codes between succeeding terms to permit 
insertion of new terms. 

In both the initial construction of the thesaurus corpus 
and in subsequent maintenance operations, step-by-step 
processing must be accomplished in precisely the sequence 
that is specified. Indicated corrections to the file from 
one step must be made before proceeding to the next 
step. Errors in data fields and inconsistencies in coding 
and spelling detected by the computer edit must be 
corrected before the missing reciprocals can be generated. 
The missing reciprocal entries detected by the computer 
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Taste 1. Thesaurus card formats 


1. Lead Term Card Punching 
Column 1 —Punch “1” 
Columns 2-8 | —Punch term code for lead term 
Columns 9-15 —Skip (ie. leave blank) 
Column 16 —Punch “1”; if alphabetic term descrip- 
tion exceeds Columns 17-80, punch a 
second card, duplicating in it Columns 
1-16, punch “2” in Column 16, and 
continue punching alphabetic term 
description in Columns 17-80 of sec- 
f ond card 
Columns 17-80 —Punch alphabetic term description 
2. Explanatory Note for Lead Term Card Punching 
Column 1 —Punch “2” ; 
Columns 2-8 | — Punch term code for lead term 
Columns 9-15 -—Skip l i 
Column 18 —Punch “1”; if alphabetic description 
for explanatory note for lead term 
exceeds Columns 19-80, etc., follow 
Column 16 instruction for Lead Term 
card punching 
Columns 17-18 —Skip 
Columns 19-80 —Punch alphabetic description for ex- 
planatory note for lead term 
8. "Use" Reference Card Punching 
Column 1 —Punch “3” 
Columns2-8 -—Punch term code for lead term 
Columns 9-15 -—Punch term code for the “use” refer- 
ence 
-Punch “1”; if alphabetic description 
for “use” reference exceeds Columns 
23-80, punch a second card, duplicat- 
ing in it Columns 1-15, punch “2” in 
Column 16, and continue punching in 
it alphabetic description of “use” ref- 
erence in Columns 23-80 of second 
card 
Columns 17-18 —Skip 
Columns 19-21 —Punch in word “use” 
Column 22 —BSkip 
Column 23-80 —Punch alphabetic deseription for 
“use” reference 
. “Used for” (UF) Reference Card Punching 
“Narrower Term” (NT) Card Punching 
. “Broader Term" (BT) Card Punching 
“Related Term” (RT) Card Punching 
Column 1 — Punch "4" for *used for" (UF) refer- 
ences 
—Punch “5” for “narrower terms” (NT) 
—Punch “6” for “broader terms" (BT) 
—Punch "7" for “related terms” (RT) 
Columns 2-8  —Punch term code for lead term 
Columns 9-15 -——Punch UF, NT, BT, or RT term 
code, as indicated by punch in Col- 
umn 1 : ; 
-Follow sequencing punching pattern 
explained for preceding instructions if 
UF, NT, BT, or RT alphabetic de- 
scription exceeds Columns 22-80 
Column 17-18 — Skip | 


Column 16 


MDC => = 


Column 16 
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. Columns 19-20 —Punch UF, NT, BT, or RT as indi- 
cated by punch in Column 1: 
Column 21 —Skip 
Columns 22-80 —Punch alphabetic description for UF, 
NT, BT, or RT reference, as indi- 
cated 


must be added to the thesaurus corpus and all conflicts 
in subterm relationships must be resolved before the 
generic expansion for the BT-NT subterm entries can 
be made. By conflicts of this sort, we refer to a given 
subentry having more than one relation (BT, NT, RT, 
USE, UF) to its corresponding lead-term entry—or added 
subterms following a USE entry. The generic expansion 
program will generate the implicit hierarchical structure 
among the BT-N'T entries and will assure that a complete 
hierarchical BT-NT list is displayed under each lead 
entry in the thesaurus. 

When all the above steps have been completed and 
corrective actions taken, the computer can tag terms that 
are not of the lowest generic level whenever they are 


displayed in the thesaurus. 


* Thesaurus Maintenance 


As originally conceived, the maintenance application 
would permit sufficient updating of a 10% or smaller 
increment to the thesaurus. In the actual processing, 
the programs were utilized for rather sizeable additions 
and numerous: shiftings of term relationships. In one 
instance, this resulted in a 100% expansion of the the- 
saurus corpus. The program design specifications were 
sufficiently flexible to handle this type of operation, but 
when used in this manner were little, if any, more efficient 
than the equivalent cleanup programs. As a general rule, 
in an operational environment thesaurus updating and 
republishing would be infrequent. Therefore, simplicity 
of application was emphasized rather than computer 
efficiency. | 

'The punch card input formats utilized in the thesaurus 
construction application were retained for maintenance, 
except that one field was added to record the mainte- 
nance operation desired, Table 2 contains instructions 
for preparation of the maintenance input. 

In actual practice, maintenance was made unduly 
cumbersome because of the manual sequence coding. 
Conflicts between coding and existing sequencing, partic- 
ularly for deletion and addition to the same main entry, 
could not be resolved by the computer without excessive 
programming effort. The immediate solution—again 
governed by the concept of simplicity of application— 
was to run deletions prior to processing any additions 
to the file. A single item could be deleted (and replaced 
by a correction) when additions were made to the file; 
however, deletion of any lead term and all of its sub- 
term occurrences in the file (an “all-points” delete) had 
to be run prior to processing any additions. 


TABLE 2. Thesaurus maintenance operation coding 


CARD COLUMN 


OPERATION 


Delete single line entry or entire scope note 

Add new lead term without scope note š 
Add new lead term with scope note of N cards 

Add new USE term but do not form reciprocal 


Add new UF term but do not form reciprocal 


Ádd new NT term but do not form reciprocal 
Add new NT term, form reciprocal,t and expand $ 


Add new BT term but do not form reciprocal 
Add new BT term, form reciprocal,f and expand $ 


Add new RT term but do not form reciprocal 
Delete all occurrences of term from file 


P* 1 77 78 79 80 
1 17 0 - - 1 
1 1 0 — 1 
i 1 1 Nt - JI 
f 2 1 Nt - 1 Add scope note of N cards 
5 3 L t= dy d Add new USE term and form reciprocal t 
6 3 1 - 1 1 
7 4 1 — 0 1 Add new UF term and form reciprocal t 
8 4 1 - 1 1 
9 5 1 - O0 1 Add new NT term and form reciprocal t 
O 5 1 - 1 1 
11 5 1 _ 0 0 
12 6 1 - 0 1 Add new BT term and form reciprocal t 
3 0 1 - 1 1 
4 6 1 _ 0 0 
5 7 1 - 0 1 Add new RT term and form reciprocal t 
T 7 1 - 1 .1 
| 7 1 2 = - 1 





* Maintenance Operation. 


f Reciprocals are formed in Maintenance Run 1. 
£ Expansion is performed in Maintenance Run 5. 








In the interest of simplicity, many modifications were 
ade to the programs while the actual work was in 
ee Ag an example, when the decision was made by 
the Office of Water Resources Research to include 
reciprocals for all related terms as well as for the 
maining subterms, all.of the missing reciprocals could 
generated automatically by the computer and could 
reentered without further review. In the earlier ap- 
heation an intellectual review was required to select 
missing reciprocals, which then had to be keypunched 
And added to the file. 


9 Combined Application for Cleanup and 
Update Procedures 


In further evaluation of the logic of the different 
omputer runs, it was apparent that a combination of 
perational steps utilizing the best features from cleanup 

| 
and maintenance would provide a simpler and far more 
cient application for either—or both—functions. Dele- 
tions, however, still require special—and efficient— 
handling since two passes of the file are required to 
elete all cross references to a given lead term. Since 
updating the thesaurus should be infrequent, it was 
determined to be more economical—and far simpler—to 
accept this inefficiency rather than write a separate delete 
program. 
' An outline of the combined cleanup-maintenance appli- 
cation finally adopted is listed on Table 3. 

When the combined thesaurus construction and mainte- 

nance application is employed for either purpose, the 


f No match. between record types 1 and 2 is an error in Maintenance Run 3 (See Attachment 8). 


Any combination not found in the above table should be considered an error. 


initial input (or update) corpus is keypunched in accord- 
ance with the instructions contained in Table 1 and 
with maintenance operation codes as shown in Table 2. 
The input data should contain at least: 


Lead term entries 
Scope notes 
Use entries 


Broad term entries—only the next broadest term is 
required to “thread” the BT-NT expansion. How- 
ever, if more than one generic tree is involved, then 
the x broadest generic level for each tree must be 
entered. 


Related term entries—no reciprocal required 


Given the input data described above, the combined 
application shown in Table 3 will perform the edit func- 
tions, generate reciprocal UF’s (use for) for each USE 
entry, generate a reciprocal RT entry for each given RT, 
generate an NT entry for each given BT, eliminate 
duplicate entries, eliminate—or tag—conflicts among the 
term relationships within the subterms of a given lead- 
term entry, and then expand the BT and NT entries 
to form the full BT-NT generic set. 

Inasmuch as all terms are under program control, those 
with narrower terms will be identified and will carry a 
distinguishing tag in the published thesaurus. The re- 
production copy for final publication of the Water Re- 
sources Thesaurus was produced in single-column format. 


lIn this instance, the rules furnished through Office of Water Re- 
sources Research stipulated that: (1) if a main term also appears as a 
subterm, eliminate the subterm; (2) if the same term is a multiple 


¿entry within or among BT’s, NT’s, or RT's, first drop the direct dupli- 


cate appearing within same relationship category, then drop RT's and 
NTs (in that order) until only a single entry of the term remains. 
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Tase 3. Combined cleariup-update application 





STEP PROGRAM OPERATION 
1 Cleanup #1 Card-to-tape. and edit of existing Thesaurus corpus (or new additions to the Thesaurus). 
2 Cleanup #2 Sorts output from Step 1 together with the — Thesaurus master in & maintenance run a8 


shown on Tape Record Ie below. 


3 Cleanup #4 Sorts same file as in Step 2 to produce record as awe on Tape 1d below. 

4 Cleanup £5 Punches out missing reciprocals from mismatch on output from Steps 2 and 4. 

5 Cleanup #5A Card output from Step 4 to tape. 

6 Maintenance #4 Sorts output from Steps 2 and 5. 

7 Maintenance #5 Expand full generic structure for BT-NT entries. 

8 Cleanup #12 Sorts output from Steps 6 and 7. 

9 Cleanup #13 Eliminates duplication and conflicts in thesaurus relationships among subterms. 
10 Cleanup #10 Sorts output from Step 9. 


11 Cleanup #11 Tags all occurrences of terms that have narrower terms listed in Thesaurus. 


12 Conversion #1 ` Sorts output from Step 11 to Thesaurus master format for printing. 
13 Thesaurus Prints Thesaurus copy in single column continuous print. 
Print | 
Magnetic Tape Record 
(80 characters, 10 records to a block) 
Lo f 
1(N) 7(N) 7(N) (O-N) (1-7) 45 MAX (A-N) 
A B C C C^ D 
Field Size & Kind Name 
A Fixed length field; 1 position; numeric Type of record. 
B Fixed length field; 7 positions; numeric Code of term in field. 
C Fixed length field; 7 positions; numeric ` 
C' ^ Fixed length field;.1 position; blank or numeric 'Trailer card control. | 
C^ Fixed length field; 1 position; numeric Reciprocal code for type of record. 
D Variable length field; 45 positions maximum Actual alphanumeric as it appears in thesaurus. 


Note: Character positions 63-79 reserved; 
character position contains record mark. 


214 


Tape Ic 
A B C C C" D 
1 2183200 2183200 1 CEREAL CROPS 
6 1623500 2183200 ` 5 CEREAL CROPS 
6 2274600 2183200 5 CEREAL CROPS 
6 4625800 2183200 5 CEREAL CROPS 
6 6113800 2183200 5 CEREAL CROPS 
6 8491500 2183200 5 CEREAL CROPS 
5 1310200 2183200 6 ` CEREAL CROPS 
5 2348500 2183200 6 CEREAL CROPS 
b 3872700 2183200 6 CEREAL CROPS 
b 4136300 2183200 6 CEREAL CROPS 
5 5234300 2183200 6 CEREAL CROPS 
Tape Id 
1 2183200 2183200 ` J CEREAL CROPS 
5 2183200 1623500 0 BARLEY 
5 2183200 2274600 6 CORN, FIELD 
5 ` 2183200 4625800 6 OATS 
5 2183200 6113800 6 RICE 
5 2183200 8491500 6 WHEAT 
6 2183200 1310200 5 AGRONOMIC CROPS 
8 2183200 2348500 5 CROPS 
6 2183200 3872700 5 GRASSES 
6 2183200 4136300 5 MONOCOTS 
6 2183200 5234300 b PLANTS 
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Column and page make-up eamera-ready copy was ac- 
complished by the printer. 
| en the combined application is used for mainte- 
nance, deletions must stil be processed first. The new 
entries can then be sorted together with the existing 
thesaurus master file in Steps 2 and 3 (Table 3). The 
rules for the new input remain the same as for the 

initial input deseribed earlier. The processing also pro- 

eds in & like manner; however, in step 7 an option is 
a vided (Switch B is off) so that only the additional 
generic structure introduced by the new material will 
be generated, thus permitting faster update processing 
in steps 7, 8, and 9. 

From actual operating experience, it was apparent that 
additional modifications to the programs could be made 

reduce their running time. 

unplicity of application would be greatly enhanced 
thereby to offset the added programming costs, these 
modifications were not made. 


q 
* Discussion 


The development of & water resources thesaurus, as 
described in the present paper, attempted to combine 
the intellectual efforts of scientific specialists with the 
most advanced techniques in computer technology. A 
nével concept was introduced in the file organization for 

e computer application. Multilist, dual file records 
were employed, thus providing a practical magnetic tape 
application on the IBM 1460- (later run on IBM 360, 
M30). Understandably, the computer techniques devel- 
oped originally, which seemed highly sophisticated at 

he time, have since given way to even more advanced 
echniques to better exploit the novel file organization 
first used in this project. 

| The publieation of this original effort was intended 
primarily to disclose & technique and also to serve ag 
a ‘stepping stone" or “jumping off" place to those who 

il be interested in further development of computer 

grams for vocabulary construction and control. 


e Conclusion 


This paper describes one method by which a thesaurus 
jas been developed, making extensive use of a computer 
supplement the intellectual effort. The computer 


However, unless the . 


techniques incorporate several exceptionally useful inno- 
vations not previously disclosed in the open literature on 
thesaurus development. The thesaurus developed in this 
study has been used in the indexing of a recent volume 
(10) of current Water Resources Research Catalog (in 
press). | 

Requests for further information on the applicability 
or availability of the computer programs may be ad- 
dressed to the authors. 
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Analysis of Questions Addressed to a Medical Reference 
Retrieval System: Comparison of Question and 


System Terminologies" 


Requests for subject and author searches submitted 
to the Medical Documentation Service of the Li- 
brary of the College of Physicians of Philadel- 
phia were studied. A total of 483 subject reference 
questions were analyzed for the number of question 
terms matching system subject headings (M), num- 
ber of question terms translatable to system subject 
headings (T), number of stop-list words (S), and 
number of untranslatable words (U), using the judg- 
ment of the author. The average question had one 
M term or 22% M, one T term or 21% T, two S words 
or 38% S, and one U word or 19% U. Thus 


* Definition of the Problem 


One of the urgent problems confronting information 
scientists today is investigating the feasibility of direct 
interaction between user and system in various types 
of retrieval. This direct interaction is usually referred to 
as the man-system interface. There has been a great 
deal of research on the system side of the interface 
dealing with hardware, indexing, storage, and processing. 
There has been little investigation of the other side: 
How will the user approach the system? 

The on-line use of computers for computational prob- 
lems is experimental in many places; the computer’s 
potential for augmenting human intellect in the com- 


* Thesis submitted in partial fulfillment of the requirements for the 
Master of Science in Information Science, School of Library Science, 
Drexel Institute of Technology, January 1967. 
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81% of the average question was accounted for 
by M+T+S; in addition, 46% of the questions had 
no U words. Analysis of variance failed to show 
significant [5 % level) differences between doctors’ 
and lawyers' questions or between negotiated and 
nonnegotiated questions in number of M, T, 5, or U. 
Study of limitations on searches for the 483 subject 
search requests and 38 requests for author searches 
showed that the requestor rarely stipulated type of 
material to be covered, languages to be included, 
time period to be covered, cost, or time for comple- 
tion. 


BARBARA FLOOD 


School of Labrary Sciences 
Drezel Institute of Technology 
Philadelphia, Pennsylvania 


putational sense is established; however, the potential 
with respect to reference retrieval is less clear. Such 
programs as Project MAC show that the problem of 
reference retrieval is under active investigation (1). 
Plans for automating the Library of Congress include 
the capability for subject access to the Library catalog 
via a query console (2). Project INTREX has similar 
goals (3). 

We need to know something about how the user will 
approach the system with his reference question in order 
to plan for having the system responsive to the question. 
We need to know how the user will formulate this ques- 
tion. Most important, we need measures of the respon- 
siveness of the system to the user’s question. 

Traditionally, the reference librarian mediates between 
the user and the system. The ideal reference librarian 
knows the user’s needs and the system terminology, and 


can formulate the user's question in system terms. Can 
this mediation be conducted by a machine? There is a 
need to know the correlation between user and system 
términologies. 

Implicit in the question of the correlation between user 

system terminologies is the problem of the ter- 
inologies of different user groups approaching the same 
. In a system designed to serve a particular dis- 
cipline (or mission), is the terminology of the user spe- 
cjálist significantly different from that of the nonspe- 
list? Presumably individuals state their questions in 
terms of their own disciplines. However, there is little 
information in support of this or about the terminology 
used by a nonspecialist in addressing specialized system. 
| In addition to possible differences in terminology among 
different user groups approaching the same specialized 
system, there may be other interesting differences among 
reference questions. For example, are there differences 
in the limitations on the questions with respect to 
material to be covered, languages to be included, time 
eriod to be covered, ete? In other words, is it neces- 
sary to make different provision for different user 
ups approaching the same system? 

The first purpose of this study was to develop a 

ethod for comparing user and system terminologies 

d to apply the method to a group of reference ques- 

ons. The hypothesis tested was that different user 
roups would formulate their questions differently with 
respect to: (1) terminology and (2) limitations on 
searches. 

The second purpose was to develop a method of char- 
acterizing the extent of mediation required between user 
and system, given different system components. 
| The specific objectives were then: 


i 
| 


— 


1. to develop a method of objectively characterizing 
questions and of comparing question and system 
terminologies; 
to apply this method to a group of reference ques- 
tions; 
to analyze the differences among user groups and 
"between negotiated and nonnegotiated questions 
with respect to: 
a. question terminology 
b. request limitations; 
to analyze the extent of mediation that might be 
- required between user and computer system for 
different systems. 


y 


x 





- — 


— — 


* Material and Methods 


APPROACH ` 


| 

| A reference question can be classified in four ways: 

| (1) by the characteristics of the questioner (user), (2) 

by the format of the request, (3) by limitations imposed 

i on the request, and (4) by the terminology of the 
question itself. In the present study the-user is char- 
acterized simply by the profession to which he belongs. 

| Request format refers to what Berul (4) has called the 


feedback dimension in retrievability; that is, the im- 
mediacy of the interaction. The immediacy proceeds from 
person-to-person, to telephone, facsimile, letter, etc., to 
the remote interaction of addressing historical material. 
In the present study the format is classified as oral or 
written. The phrase “limitations on the request" in- 
cludes anything the user might say about the request 
other than the question itself; examples are material to 
be searched, cost, and the time dimension of the litera- 
ture to be covered. 

There are other ways of classifying reference questions 
but most of them are derived from the answer to the 
question rather than from the question. Examples in- 
clude various classifications of answers, sources used 
for searching, time taken in searching, and physical 
form of the answer (e.g., bibliographic list). These are 
not considered here. 

The approach to the analysis of question terminology 
taken in this study includes certain background assump- 
tions which require explanation. 

For interaction with a user, a system has a certain set 
of components. There are different types and numbers 
of components in different systems. A reference retrieval 
system may have three typical components for interac- 
tion with the user. The first component is a list of system 
terms. A second is a list of rules for determining match 
to the list of system terms. A third is a list of words 
which will not appear in the system. That is to say, a 
list of entries, a list of translation algorithms, and a list 
of prevented words. Different systems will have these 
lists in different number and in different combinations. 

It is possible to analyze questions addressed to a 
system in the context of the responsiveness of the dif- 
ferent system components to question elements. This 
involves differentiating the question into analogous com- 
ponents and matching each question component to the 
corresponding system component; that is to say, 
classifying question elements in terms of system com- 
ponents. Such an analytic approach ignores the syn- 


` tactical arrangement of the question components. In 


addition, the elements are not necessarily smgle words; 
each question element is a unit list match (unmatched 
elements are then words). The sum of the question 
elements which match the different lists and those 
which do not, is then an arbitrarily chosen quantity 
which represents the sum of the different kinds of matches 
and the unmatched words. 

In the present study, the three system components 
are: (1) subject headings, (2) translations, and (3) 
nonsubstantive words. 'l'he four question components are 
(1) terms that are the same as the subject headings, (2) 
terms that can be translated to.subject headings, (3) 
nonsubstantive words, and (4) words that do not fit into 
the other three categories. For example, in the request 
for material on “carcinoma metastasized to the esoph- 
agus," both “carcinoma” and “esophagus” are terms 
that are the same as subject headings. “Metastasized” 
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can be translated to the subject heading “neoplasm 
metastasis.” “To” and "the" are nonsubstantive words. 

It js assumed that a reference retrieval system could 
contain any or all of the three lists. Both human and 
computer question analysis can proceed in terms of such 


lists. Measures of correspondance between question com- - 


ponents and system components are therefore possible. 


Source OF QUESTIONS 


The literature search records of the Medical Docu- 
mentation Service (MDS) of the Library of the College 
of Physicians of Philadelphia were used as the source of 
questions. Although MDS deals solely in medical infor- 
mation, users include lawyers and other nonmedical 
persons as well as physicians. Physicians were therefore 
considered the “specialists” and lawyers and others were 
considered the “nonspecialists.” 
` MDS charges a fee for its services. Therefore MDS 
deals with a particular class of information need: infor- 
mation the user is willing to pay for. It is assumed that 
such a need is close to at least one kind of need someone 
might have in approaching a remote terminal. 

One source of questions to this service is the reference 
desk of the Library of the College of Physicians of Phil- 
adelphia. Questions are referred from the reference desk 
when the librarian estimates that the question will re- 
quire more than 20 minutes of searching time to answer. 
(The reference desk service is nonfee.) 

A second source of questions is direct communications 
from users. Direct communications may be by telephone, 
letter, or personal visit. This direct approach implies 
that the user is familiar with the service and has reason 
to believe that the service can handle his question. 

A third source of questions is referral from a pub- 
lishing company. The majority of the written requests 
originated in this way. The publishing company pro- 
vides coupons for 1-hour searches with purchase of a 
medical encyclopedia. MDS has a contract with the 
publishing company to conduct these searches. The re- 
quests referred by the publishing company are slightly 
different from other MDS requests in two respects: (1) 
They are received indirectly and (2) they are not “fee” 
queries in the same sense as the others. However, the 
requestor had paid for the encyclopedia and it is as- 
sumed that the formalized act of specifying the request 
and sending in the coupon reflects a “need to know” 
similar to that reflected by contacting MDS directly. 

In any case, the request is recorded on a search request 
form (Appendix A‘). At the time of this study, records 
were available for 5 years, totaling 521 requests. For 
the most part, one individual ? was responsible for record- 
ing questions and making searches during this, 5-year 
period. Searches were occasionally delegated to others 
and requests were sometimes discussed with other staff 


1 Appendices on file at ADI Auxillary Publications Service. 
3 Walter Bethel, Director, MDS, 
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members, but the same individual was responsible for all 
searches. For this reason, the problem of intersearcher 
reliability was not considered. 

Questions from these three sources were divided into 
oral and written formats. Oral questions (telephone or in 
person requests) were probably “negotiated.” That is, 
the searcher tried to resolve unclear points by asking the 
requestor what was meant. In this study such “negotia- 
tions” or-changes of question terminology were taken as 
constant because they were conducted by the same 
searcher. The bias provided by the searcher’s accumulat- 
ing experience with the system and with the various 
types of questions over time is recognized but was as- 
sumed to apply to both “specialist” and “nonspecialist.” 

Negotiated questions, for this study, are those which 
were received orally, whereas nonnegotiated questions are 
those which were received in written form. 


MATERIAL ` 


The entire set of search requests received by MDS 
from October 1961 to May 1966 consisted of 521 ques- 
tions; these were used for study. Because the system 
addressed was medical, the system language chosen was 
the list of Medical Subject Headings (MeSH) used by 
the National Library of Medicine for Medical Literature 
Analysis and Retrieval System (MEDLARS). MeSH was 
chosen because it is the main search tool at MDS. 

Consideration was given to whether the edition of 
MeSH appropriate to the year of the question should 
be used. It was decided that a single edition (1965) 
would be used for all questions. (The problem of MeSH 
changes over time and how to standardize them is a 
separate one.) 


METHODS 


A protocol sheet was filled out for each search request. 
The protocol sheet had the following information: (1) 
user profession, (2) format of the question, (3) search 
limitations, (4) verbatim question, and (5) question 
components (Appendix B). The questions were analyzed 
and tabulated and data concerning the question com- 
ponents were tested statistically. | 


User Profession 


The user groups were “doctor,” “lawyer,” and “other.” 
“Doctor” was defined as M.D., D.O, D.D.S., and the 
equivalent. This definition excluded paramedical person- 
nel such as R.N., M.T., O.T.; these were included under 
“other.” “Lawyer” included all requests from lawyers 
and law firms. “Other” included all requests not coming 
from “doctor” or “lawyer” as defined above. 


Request Format 


The request format could either be oral or written. 
Oral requests were those recorded by the searcher from 


in oral request received either on the telephone or in 
person. A written request was one received by letter, 


either directly or from the publishing company. 
| 


Search Limitations 
¡ Search limitations refers to comments about or restric- 
d on the request other than the question terminology 
itself. Limitations were. tabulated as follows: Type of 
patera included the choiees: all, articles, and other. 
anguages to be included was grouped as English only, 
, and other. Time period to be covered was grouped 
current, 5 years, and other; time period means how 
far back in time to search the literature; for this study 
jeurrent^ was interpreted as coverage in the most — 
vear's literature. Cost was grouped as 1 hour ($8.), 1 
our preliminary (before authorization to proceed), and 
Other; all the publishing company requests were auto- 
matieally for 1 hour only. Time for completion was less 
than 1 week and more than 1 week; time referred to 
here is how soon the user wanted the material. 


Terminology Analysis 


; | Terminology analysis was — from the system 
ide of the interface. That is, it was assumed that the 
— could contain various components, or lists. A 
match to an assumed list was considered one item regard- 
ess of how many words were in the item. Thus a subject 
eading item might be one word or it might be two or 
ore words. Similarly a translation could be to a one 
ord subject heading or to two or more words. In 
ither case, the match was counted as unitary. Putting 
it another way, the match was to one list item rather 
han to one list word. The items on the assumed lists 
were those to which the question words were compared. 
The analysis of terminology consisted of the author 
examining each question and finding out how many 
terms in the question were identical with terms in 
MeSH; how many terms could be accommodated by 
MeSH if they were translated into MeSH terminology; 
how many words could be taken care of if a stop-list (list 
— words) were added to the system; and how 
many words were untranslatable. Thus there were four 
‘possible question components: (1) matching terms, (2) 
translatable terms, (3) stop-list words, and (4) un- 
‘translatable words. 
| Matching. There are two kinds of MeSH terms: 
subject headings and cross references. The two were 
recorded separately on the protocol sheet. In tabulation, 
however, there were too few cross references to warrant 
separate treatment. 

. Mateh was defined as an identical gymbol string in the 
question and in MeSH. Note that & match may be one 
or several words long, depending on the MeSH entry. 
| A list of matching terms was compiled and the frequency 
of each term was noted. 

Translation. This question component resulted from 








— 


human judgment about the nearest MeSH entry. The 
guide for translation was to be as objective as possible. 
There are varying degrees of subjectivity in translation 
a8 illustrated below. 

Inyersion occurs mainly with adjectival and preposi- 
tional phrases which are changed so that the noun is 


‘followed by its modifier in system entries. : 


eg., Medical education = Education, medical 
Dislocation of the hip= Hip dislocation 


Word variants refer to words with the same root; ie., 
plurals, verb, and adjectival forms. 


e.g., Lumor-—' Tumors 
Transplantation = Transplants 


Compounding refers to & question term requiring 
translation to two or more MeSH entries or, conversely, 
two or more question terms comprising only one MeSH 
entry. 


e.g. Deprol 2 Meprobamate + Benactyzine 
Otomycosis= M yeosis 4- Otitis 


Synonymy between question and MeSH terminologies 
could be difficult to evaluate as illustrated by the follow- 
ing example: 


e.g., Pitfalls and complications of gallbladder surgery = 
Postoperative complications + Cholecystectomy 


The translation of “gallbladder surgery” to “cholecystec- 
tomy” might be considered a synonymous relation. 
Strictly speaking, however, the question term might refer 
to any operative procedure involving the gallbladder, not 
just removal (e.g., stone removal, duct stretching). The 
synonymy lies in the fact that cholecystectomy is the 
most common kind of gallbladder surgery and in the 
fact that MeSH carries no other entry which might 
comprise gallbladder surgery. This example illustrates 
the subjective nature of translation by an intermediary; 
the judgment depends on the intermediary’s training and 
experience, both in the subject matter and with the 
system. To continue with this example, the other part 
of the translation (“postoperative complications”) 18 
even more questionable because it does not cover all 
“pitfalls and complications” which might occur before 
and during surgery. On the other hand, the alternative 
entry “surgery, operative,’ would appear too general to 
cover “pitfalls and complications.” 

Generic-specific relationships between question and 
MeSH terminologies comprised cases in which the MeSH 
entry was either more general or more specific than 
the question term. It was found early in the investigation 
that it was very difficult to find a cutoff point for gen- 
erality or specificity. Therefore translations of this kind 
were held to a minimum. 

A list of terms requiring translation was compiled and 
listed according to MeSH terms and also according to 
type of translation involved. | 

Stop-List. Another list which was compiled empirically 
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was a list of common words considered to be nonsubstan- 
tive. Such words include articles, prepositions, adverbs, 
and conjunctions. The decision as to whether to put a 
word on the stop-list was made on the basis of (1) how 
“common” the word was judged to be and (2) how 
frequently it appeared. The stop-list was generated 
empirically so that it reflects judgment first and fre- 
quency second; in a further study, the basic list would 
be modified according to some frequency criterion. It 
was assumed that on-line machine searching could include 
a dictionary of nonsubstantive words similar to the list 
frequently used with KWIC programs. Each word for 
the stop-list was counted once (as opposed to each term 
for matching and translation). 

Untranslatable Words. The remaining question com- 
ponent was the group of words which did not match, 
could not be translated, and were not stop-list words. 
These were called untranslatable words. A list of these 
words was compiled with notation of frequency. Ex- 
amples are: recover, perforation, and secretion. 

Summary. The question components included matches, 
translations, stop-list words, and untranslatable words. 
The sums of each can be formulated as follows: 


Q=M+T+S+U (1) 


where Q is the sum of the question components, M the 
sum of the matching terms, T the sum of the translatable 
terms, S the sum of the stop-list words, and U is the 
sum of the untranslatable words. 

Examples of Question Analysis. Each question word, 
starting at the left, was looked up in MeSH, and each 
component was recorded. 


Question 32. Carcinoma metastasized to the esophagus 
M: carcinoma; esophagus 
T': metastasized =neoplasm metastasis 
Š: to; the 
U: — 
Question 24. Isoenzymes of alkaline phosphatase 
if: alkaline phosphatase 
T: isoenzymes — enzymes 
S: of 
U: — | 
Question 21. Aneurysms oí the uterine artery and its 
branches 
M: — 
T: aneurysms— aneurysm 
uterine- uterus 
artery — arteries 
S: of; the; and; its 
U: branches 


ANALYBES 
Analysis of Variance 


The following hypotheses were tested for statistical 
significance at the 5% level using analysis of variance: 
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1. There is no difference in the number of matching 
terms among user groups or between request 
formats. 

2. There is no difference in number of translatable 
terms among user groups or between request 
formats. 

3. There is no difference in the number of stop-list 
words among user groups or between request 
formats. | 

4. There is no difference in number of untranslatable 
words among user groups or between request 
formats. 


Measures of System Responsiveness 


It has been assumed that a system has three possible 
components or lists to correspond to three terminologie 
components of the user's question. That is to say, there 
might be & component for matching terms (M), & com- 
ponent for translatable terms (T), and a component for 
stop-list words (S).: By definition there is no system 
component for untranslatable words (U). 

Given three components, there are seven possible 
system arrangements: (1) a match list alone, (2) transla- 
tion list alone, (3) stop-list alone, (4) match and trans- 
latable lists, (5) match and stop-lists, (6) translatable 
and stop-lists, and (7) match and translatable and stop- 
lists. 

We can therefore determine the ratio of matching 
terms (M), translatable terms (T), and stop-list words 
(S) in questions according to Equation 1, in order to be 
able to evaluate the responsiveness of different system 
arrangements to question terminology. The first thing 
we want to know is the equality (E) between question 
and system terminology; this is given by the match ratio 
expressed as a percentage: 


E = M/Q x 100 (2) 


Then we want to know the improvement that would 
be effected by the various possible combined system ar- 
rangements. By adding matching and translatable terms 
we can derive a ratio which reflects the compatibility 
(C) between question and system terminology. 


C — (M + T)/Q x 100 (3) 


This ratio refleets the proportion of the question ter- 
minology which is "substantive" from the system point of 
view. The other two two-way combinations may be 
calculated by addition; they are not considered of 
sufficient interest for separate treatment. 

The three-way combination (match and translatable 
and stop) can be used to obtain the proportion of the 
question terminology that could be translated into system 
terminology. Translatability (Tr) is equal to the ratio 
of matching terms (M) and translatable terms (T) and 
stop-list words (S) to the total question components 
(Q). 


Tr — (M + T + 8)/Q x 100 (4) 


i 
| 
| 
: 
! 


* Results 


GENERAL 


There was a total of 521 search requests. This total 
included 38 requests for author searches, 471 requests for 
subject searches, and 12 asking for both author and 
subject. The verbatim questions are listed in Appendix 
c Only one of the author requests was for an historical 
name (Louis Pasteur, question 126). 

| 


| 1 
User PROFESSION AND Request Formar 


The breakdown of user profession (doctor, lawyer, 
other) and request format (oral, written) is shown in 
Table 1. The "other" group included 36 requests from 
drug companies and scattered requests from students, 
laboratories, architects, librarians, hospital and surgical 
supply companies, and research foundations. No attempt 
was made to determine how many different individuals 
addressed the service. There were many examples of the 
same user asking different questions, both at the same 
time and at different times over the 5 years. There 
were also many examples of a user employing the service 
only once. 


f 
SrARcH LIMITATIONS 


| Most requestors did not delimit the scope of the ques- 
tion. Examination of Table 2 shows that there were 
relatively few requests for all types of material, and 
when specified, articles tended to be asked for. In terms 
of languages to be included, there was a preponderance 
of requests for English language material only; the 
“other” group was small and scattered among European 
languages. Time period to be covered remained mostly 
unspecified. (The general policy at MDS is to search 
the most recent 5 years when time period is unspecified; 
author searches are generally for all publications.) The 
stipulations as to cost are skewed by the fact that all 
142 publishing company requests were for 1 hour. Analy- 
sis of the time requested for completion showed that 


| Taste 1. User profession and request format 


Oral Written 'Total 
Publishing co. Other 

'Doctor 193 

| E 

| 199 137 15 351 

Lawyer 80 

32* 

i 112 3 - 115 

'Other al 2 22 55 

"Total 342 142 (179) 37 521 


* Author. 


TABLE 2, Search limitations 


Type of Material 


Unspecified 413 
All 26 
Articles 66 
Other 16 
Languages to be Included 
Unspecified 384 
English only 96 
All 32 
Other 9 
Time Period to be Covered 
Unspecified 424 
Current 30 
Five years 22 
Other 45 
Cost 
Unspecified 328 
One hour 155 
One hour prelim. 0 
Other 38 
Time for Completion 
Unspecified 448 
One week 65 
More than one week 5 


among those asking for the material in less than 1 week, 
25 wanted it as soon as possible and 13 asked for same- 
day or 1-day service. In general, when one limitation was 
unspecified, so were all the others. 


TERMINOLOGY ANALYSIS 


The verbatim questions are given in Appendix C, the 
matching list in Appendix D, the translatable list in 
Appendix E, the stop list in Appendix G, and the un- 
translatable list in Appendix H. 

The 483 subject questions generated 364 different 
matching terms which were used 559 times. The only 
matching terms that appeared frequently (8-10 times) 
were “cancer,” “patients,” “surgery,” and “trauma.” 

There was a total of 397 different question terms that 
could be translated into 338 MeSH terms; these 397 
terms occurred a total of 538 times. Analysis of the 
translatable terms is given in Appendix F in which the 
MeSH term is listed according to the type of transla- 
tion involved. There were 21 examples of simple in- 
version. Word variants represented the largest group 
(188 times) and, of these, singular to plural or plural to 
singular translations occurred 67 times; synonyms oc- 
curred 42 times, including 5 examples of abbreviations; 
compounding occurred 20 times and generic-specific re- 
lationships 31 times. The “other” group (98 times) 
included cases of more than one type of translation as 
well as a few cases that did not fall in the above 
categories. 

There were 98 words on the stop list, which occurred 
a total of 975 times. The frequencies of the 17 words 
occurring more than 10 times is given in Table 3. 
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Taste 3. Most frequent stop-list words 


Word l Frequency 
of 224 
the 114 
in 113 
and 98 
to 36 
or 29 
on 25 
for 23 
as 20 
with 19 
a 19 
by 17 
effect 14 
use 13 
after 11 
de 11 
la 11 


There were 362 different untranslatable words occur- 
ring 473 times. Examination of the list (Appendix H) 
shows that many of the words might be considered 
candidate stop-list words. There were also a number 
of substantive words such as “abasia,” “arteriospasm,” 
and “atresia,” which were not considered translatable 
because they were too specific (ie. the hierarchical dis- 
tance from the nearest subject heading was too great). 

The average question components and percentages are 
shown in Tables 4, 5, and 6. The average question 
contained about five terms, of which approximately one 
matched MeSH, one was translatable, two were stop-list 
words, and one wag an untranslatable word. The range 
of Q was 1 to 25 units. 

Table 7 gives a comparison of average number of 
question components and average number of question 
words in each category. The word data were obtained by 
counting the number and frequency of matching and 


Taste 4, Average question components by profession 


Profession Q M T S U N` 
Doctor ^ 2/0000 í í — 
oral 474 12 101 172 81 198 
written 565 113 127 228 99 162 
total 614 117 112 198 89 345 
Lawyer 
oral 481 100 86 175 118 80 
written 600 33 200 200 187 3 
total 485 98 91 178 119 83 
Other 
oral 581 135 12% 210 110 31 
written 796 138 150 371 146 24 
total | 678 136 136 280 125 65 
Total 5.28 1.16 ].11 2.02 .98 483 
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TABLE $. Average question component percentages by format 


Format 0o M % T So Š JU N 
Oral 
Doctor E 25.6 212 36.2 -17.0 193 
Lawyer 20.8 17.9 38.9 244 80 
Other 233 21.7 36.1 18.9 31 
Total oral 24.0 204 36.4 192 304 
Written | 
Doctor 19.9 225 39.9 17.6 152 
Lawyer 56 33.3 33.3 278 38 
Other 17.1 18.6 46.1 18.1 24 


Total written 192 220 409 178 19- 


Grand Total 22.0 21.1 383 18.6 483 


translatable terms appearing in questions which had two 
or more words in these terms. The average question had 
six words in it. 

There were no appreciable differences among user : 
groups in the percentages of M or S, but lawyers’ ques- 
tions tended to have a lower percentage of T and a 
higher percentage of U. Analysis by format showed that 
written requests tended to have a lower percentage of 
M, a higher percentage of T and Š and a lower percentage 
of U as compared with oral. The differences in each case 
were small. | 

Analysis of variance failed to reveal significant differ- 
ences among user professions or between formats in M, 
T, S, or U. Thus it was not possible to reject the null 
hypotheses. (Although the F levels required for 5% 
aignificance were 3.02 and 3.86, the only value above 1.00 
was for between formats for T' at 2.84.) 

The effect of applying the various measures of system 
responsiveness (equality, compatibility, translatability) is 
shown in Tables 8 and 9. The remaining component is 
the untranslatable (Untr) fraction. The number of ques- 


Taste 6. Average question component percentages by 


profession 
Profession % M oo T To Š 9o U N 
Doctor 
oral 25.6 212 36.2 17.0. 193 
written 199 225 399 176 182 
total doctor 22.8 218 380 17.3 345 
Lawyer 
oral 20.8 17.9 36.9. 24.4 80 
written 56 33.3 33.3 27.8 3 
total lawyer 201 186 367 246 88 
Other 
oral 23.3 21.7 36.1 189 31 
written 17.1 18.6 46.1 18.1 24 
total other 201 01 413 185 55 
Grand Total 22.0 211 38.3 18.6 483 


tions in which each measure accounted for 100% of the 
question i is given in Table 10. 


— 7. Comparison of average question components and 
average question words 





| Trans- 


| Question Matching latable Stop latable ` 


sos 5.28 1.16 1.11 202 98 
— words 581 1.35 1.46 2.02 .98 
— 

Component 
ercentage 22 21 38 19 


Untrans- 





TABLE 8. Average percentage of Q covered by measures by 
format 


Untr N 





d 9. Áverage percentage of Q covered by measures by 


profession 
Profession E C Tr - Unir N 
Do etor 22.8 448 82.8 173 345 
Lawyer 20.1 38.7 75.4 24.6 83 
ler 20.1 40.2 815 18.5 5b 
Total 220 430 813 186 483 





TABLE 10. Number and percentage of 100% measures 





Profession No. % No. % No. 9?» No. % 





* Discussion 


The amount of mediation required between user ques- 
tion and system response is the opposite of system 
responsiveness. That is to say, mediation is required to 
the. extent that the system is not responsive. The 
mediation can be conducted by the user himself; when 


` he recognizes that the system is not responsive, he 


modifies the question until the system responds. Tradi- 
tionally mediation has been conducted by a trained inter- 
mediary; he has some knowledge of the user, negotiates 
the question, determines the limitations on the search, 
and translates the question terminology into system 
terminology in accordance with his knowledge of system 
components. 

This study failed to demonstrate — in system 
responsiveness according to user profession. The -content 
of doctors’ questions may have been different from the 
content of lawyers’ questions but there was little dif- 
ference in the correspondence of question components to 
system components. This suggests that, for MDS at 
least, there is no need to differentiate between medical 
“specialist” (doctor) requests and medical “nonspecialist” 
(lawyer) requests to improve responsiveness, 

The failure to demonstrate differences in system re- 
sponsiveness according -to the format (oral, written) of 
the request casts uncertainty on the need for question 
negotiation. That is, there is doubt if the purpose of 
negotiation is to.increase the proportion of the. matching 
component. Table 5 shows that although oral (negotiated) 
questions were higher than written (nonnegotiated) in 


-M, this difference was not due to either T' or U, but to 


S. This indicates only that written questions tend to have 
more nonsubstantive words, as might have been expected. 
If, on the other hand, the purpose of negotiation is to 
increase the precision of system responsiveness, this pur- 
pose can perhaps best be achieved by the user reformu- 
lating his question as a result of system response. 


SEARCH LIMITATIONS 
The finding that at MDS there were seldom limitations 


imposed on searches might be interpreted to cast doubt 


on the need for including such limitations in systems for 
direct man-system interaction. However, there are other 
considerations: In the case of written questions (1.e., 
questions that were not negotiated) the user could not 
know that limitations might be useful or necessary be- 
cause he ordinarily did not have a.search request form 
to guide him. In the case of oral questions (i.e., ques- 
tions that were negotiated) limitations might not have 
been asked for by the searcher or they.might not have 
been recorded. Limitations would tend not to be asked 
for if (1) the question were estimated by the searcher 
to require only a short search for an answer (author 
search, single reference); (2) the usual needs of a par- 
ticular class of user (profession or some breakdown of 
profession) were known to the searcher; and (3) the 
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usual needs of a particular user were known. The ex- 
perience and bias of the searcher therefore influences the 
recording of search limitations. 

Cases in which limitations on search were stipulated 
were so few that it was not thought useful to differentiate 
them into professions or formats. The sparse data avail- 
able showed that there was & tendency to stipulate 
English language articles. When the time for completion 
was specified, the requestor wanted the product in less 
than 1 week. 

In comparison, Kronick (4) found that “English only” 
was specified in all but 19 of 700 instances at the Cleve- 
land Medical Library. The difference was probably be- 
cause a higher percentage of the MDS requests were for 
“in depth” searches; i.e., everything available was wanted. 
Kronick also found that 57% of his requests were for 
recent (up to 2 years) material, 29% for 2-5 years, 5% 
for 5-10 years, and 3% for 10-20 years. Analysis of time 
period to be covered in the present study suggests con- 
currence with his figures in those instances when time 
period was specified. | 

Thus limitations on searches seem to be more often 
implicit than explicit. In other words, stipulations are 
implied by the kind of question and who posed it, rather 
than being detailed. In considering direct man-machine 
interaction, the importance of designating search limita- 
tions for cutting the size of the file to be searched is 
considerable. The findings of the present study suggest 
that the user does not ordinarily consider the importance 
of delimiting the search; therefore, the system would 
have to provide a specific checklist or other direct pro- 
cedure for eliciting such information. 


MEASURES OF SY8TEM RESPONSIVENESS 


Equality is the criterion of absolute match between 
user and system terminology, just as would be necessary 
for & computer to recognize match. In a system with 
just one component, such as a subject heading list, 
mediation is required for all question components except 
the matching one. 

The results show that between a fifth and a quarter 
of the question terms for this study can be matched by 
MeSH without modification of MeSH or special training 
of the user (Tables 8 and 9). In addition, in 7% of the 
questions, the entire question could be matched (Table 
10). Because these were “real” questions, these measures 
‘may indicate what percentage of response a user would 
obtain from a machine system using MeSH; however, 
it is unknown whether he would formulate his question 
to a machine system in the same way as he now formu- 
lates a reference question to a human searcher (especially 
if the machine vocabulary were provided to the user). 

Users’ terminology can be considered an indexing lan- 
guage in the sense that question terms are formulated 
to indicate a body of information. It is therefore of 
interest to compare the correspondence between user 

and system terminologies in the present study to studies 
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of the correspondence between and among indexing lan- 
guages. 

Users’ terminology is about as equal to MeSH in the 
present study as MeSH is to LC in a study by Brooks 
and Kilgour (6). They found 37% of subject headings 
exactly the same in MeSH and LC. Adjusting the 
equality figures in the present study for exclusion of stop- 
list words (which do not appear in subject headings) 
gives an average of 36%. Therefore users’ terminology ` 
is as close to the indexing language as these two indexing 
languages are to each other. This raises the question 
of the generalizability of the findings in the present study. 
The equality between ifidexing languages is also con- 
sidered important because of studies of compatibility 
among indexing languages (Schultz (7), Painter (8), 
Hammond and Rosenborg (9). The word “compatibil- 
ity” was chosen in the present study to be consistent with 
usage in these compatibility studies. | 

Compatibility is & measure of what percentage of 
users’ ideas are dealt with in the system. It includes 
both matching and translatable question components and 
reflects the increase in system responsiveness obtained 
with a two-component system. Compatibility averaged 
43% (Tables 8 and 9); 18% of the questions were 
100% compatible (Table 10). 

When the compatibility figures are adjusted for ex- 
clusion of the stop-list, a figure of 70% is found for 
comparison with other studies. Schultz (7) found that 
three-quarters of the subject heading list for indexing 
the meeting papers of the Federation of American Soci- 
eties for Experimental Biology (FASEB) was accommo- 
dated by both MeSH and NIH Research Grants Index 
authority list. Similarly, Brooks and Kilgour (6) found 
79% of MeSH subject headings “adequately covered” 
by LC. These comparisons suggest that users’ -vocabu- 
laries are as similar to indexing languages as indexing 
languages covering the same subject areas are to each 
other. 

In evaluaüng any two languages, & comparison “ai 
equality and compatibility gives a criterion for the need 
to expand cross-references to gain greater equality. 
Analysis for type of translation needed would show what 
sorts of programming are required for direct user-machine ` 
interaction. If the findings of the present study are borne 
out by future analyses of the similarity between MeSH 
and user terminology, a need would be established for 
developing machine programs that deal with simple in- 
versions and word variants; also a need for expanding 
dictionaries of cross-references to deal with synonyms. 
The results also suggest that compounding (one to more 
than one transformation) and generic-specific relation- 
ships, although difficult to handle, are statistically less 
frequent problems than other types of translations. 

Translatability (as used in this study) indicates the 
further refinement which might be brought about by 


- adding a stop list to the system (M+7+8S). The 


average question was 81% translatable (Tables 8 and 
9); almost 50% of the questions were found to be 100% 


| 
| 
| 


T | 
translatable (Table 10). This means that; given a system 
with a vocabulary list, appropriate algorithms for trans- 
lation, and a stop-list, half the time it would respond 


ly to a user's question; or, looking .at it the other : 
way, 80% of the average questions could be handled with- | 


out mediation. . ` 
| Addition of the M and S percentages shows that, on 
the average, 60% of questions could be handled by a 


system having just a vocabulary list (such as MeSH). 


and a, stop-list. 

The remaining question component: is the untranslat- 
able fraction; 19% of the average question remained 
untranslatable (Tables 8 and 9); only 2% of the ques- 
tions were 100% untranslatable (Table 10). 

‘Adjusting the untranslatability figures to exclude the 
stop-list terms gives an average figure of 30% for 
comparison with other studies. Schultz (7) found that 
10% of the FASEB terms was not accomodated by at 
least one of the other vocabularies (MeSH, ASTIA des- 
eriptors, and the NIH Biomedical Sciences Dictionary). 
Brooks and Kilgour (8) found 15.8% of LC subject head- 
ings used at the Yale Medical Library “unmatched” in 
MeSH. Hammond and Rosenborg (9) found 10.9% of 
the Atomic Energy Commission (AEC) terms had no 
equivalents in the ASTIA descriptor list. It is difficult 
to; determine ‘whether the differences between the present 
study and the other three are due to different translation 
rules or to absolute differences between user vocabularies 
and MeSH on the one hand, and between systems’ 
vocabularies on the other. A major proportion of the 
difference may be attributed to the larger vocabulary 
of , natural language as compared with controlled vo- 
cabularies. In any case, the untranslatable fraction 
would appear & major area for system improvement. 

Whether the quantitative values of the various mea- 
sures are adequate for any one system must be left to 
the, judgment of each system designer. Empirical studies 
of the tolerance of users to different quantitative values 


are! required. However, the method of measuring de- 


a here may be of temporary help for evaluating 
indices or thesauri according to user vocabularies and 
for making changes in response to user terminology. 

| 
TERMINOLOGY ANALYSIS 


Sings the reliability of the present study depends to a 
great extent on the criteria used for analyzing termi- 
nology, some comments are in order on the method used 
in deriving question components. Similarly comments 
areirequired about factors which may have influenced 
the'validity of the findings. 

The average question components are shown in Tables 
3, 4! and 5. These average findings should be interpreted 
as conservative figures because of the rules used in 
analysis. (See Methods: Terminology Analysis). Thus, 
a term was only considered a match if it was identical 
to the MeSH entry. For example, a hyphen in the 
question term which otherwise matched MeSH was 

} 


| 


enough to put the term in the translatable category 
(e.g, oculomotor, question 3). Because of the left to 
right analysis of the questions, a query about “myelog- 
enous leukemia” was not considered a match for “leu- 
kemia” but a translatable for “leukemia, myelogenous.” 

Another reason that the average component figures 
should be interpreted as conservative is that, although 


- MeSH was chosen as the standard for medical vocabulary, 


the users were not addressing MEDLARS. They were 
addressing the holdings of the Library of the College of 
Physicians and specifically MDS. There were questions 
that were essentially architectural (question 27). There 
were questions that required textbook material for. the 
answer, such as pictures of normal anatomy (questions 
191-194). Presumably if the questions not suited to 
MEDLARS had been eliminated, the proportion of un- 
translatable words would have been lower. 

Again, if the 16 Spanish language questions had been 
eliminated, the proportion of untranslatable words would 
have been lower. It might be expected that foreign 
language questions would inflate T in comparison with 
M. This was not necessarily true since much of medical 
terminology is international. In fact, some of the requests 
from Spanish speaking countries that were in English 


were less likely to match than the ones in Spanish; for 


example, the appearance of “different diagnosis” in ques- 
tion 82, and “nourse children” in question 81. However, 
the system vocabulary would not ordinarily be expected 
to have foreign language stop words as “de” and “la” 
which appear on the list of most frequent stop words 
(Table 2); the stop-list was distinctly affected by the 
foreign language requests. 

The approach in deriving the translatable list was also 
conservative: when there was doubt, the word was con- 
sidered untranslatable. Of the categories of translatable 
terms (Appendix E), synonyms, compounding and 
generic-specific relationships proved most difficult to 
handle. Many synonyms were abbreviations; others were 
provided by Anglo-Saxon forms where the Latin or Greek 
form appeared in MeSH or the other way around. The 
usual cross-references in a dictionary or authority list 
would not be expected to have all of these. Instances 
of compounding occurred mainly with terms such as 
“tympanoplasty” which was translated to “tympanic 
membrane” and “plastic surgery.” 

Generic-specific relationships were most difficult; the 
difficulty lay in determining a rule for a cutoff in the 
hierarchy. For example, “ablation” is specific to “sur- 
gery,” “arteriospasm” specific to “vascular diseases." For 
this study, the hierarchical distance in these examples 
was considered too great for useful inclusion as transla- 
tions. Therefore, both “ablation” and “arteriospasm” 
appear on the untranslatable list (Appendix H). 

The problem of what level of generality or specificity 
should be the cutoff point in translation is illustrated by 
the Hammond and Rosenborg study (9) of the converti- 
bility between the ASTIA descriptor list and AEC 
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dictionary. They assigned six different categories to 
what has been termed translation here. Of these, four 
dealt with the generic-specifie problem. However, the 
findings of the present study suggest that the generic- 
specific relation represents only a small proportion of 
the total amount of translation required. Perhaps a 
solution to this problem is not as urgent as others. 

Singular or plural forms comprised a large proportion 
of the word variant type of translation and invite 
comparison with a recent study by Bloomfield (10). He 
studied indexing of singular and plurals in Webster's 
Unabridged Dictionary, Chemical Abstracts, Nuclear 
Science Abstracts, and an IBM KWIC index and found 
marked inconsistencies. Since conversion to singular or 
plural form represented a large category of translations 
in the present study, medical terminology may be added 
to Bloomfield’s list. Although MeSH was not studied 
formally for singulars and plurals in the present study, 
MeSH. was found to be inconsistent. Sometimes one is 
used and sometimes the other, although no instances of 
both forms appearing were noted. Some of the reasons 
for inconsistency in use of singulars and plurals became 
clear as the analysis proceeded. One reason is apparently 
to differentiate homographs; for example, "joint" may 
be an adjective or a noun but “joints” is clearly a noun 
and, in a.medical context may be taken to refer to 
&rthrology. Another reason is the persistence of Latin 
and Greek forms in medical terminology. The plural 
of carcinoma may be either carcinomata or carcinomas. 
There seems to be a tendency for the MeSH entry to 
have been constructed to avoid this problem, although 
no tabulation was made. Question plurals revealed both 
English and Latin or Greek forms (including a few 
incorrect ones). Another source of the singular being 
used in-a question occurred when just & single instance 
was specified. A final class was those terms appearing in 
& question as singular because they were parts of nominal 
compounds (e.g., question 208, mesenteric p throm- 
bosis). 

The stop-list was also derived Masu 22 in the 
sense that a doubtful word was considered untranslatable 
unless it was neither substantive nor appeared more than 
two times. Theréfore, the untranslatable list includes 
many words which might be considered candidate stop- 
list words if they were to appear with significant fre- 
quency. The untranslatable list also includes substantive 
terms which did not appear in MeSH because they were 
too specific ("arteriospasm") or because they were not 
appropriate to MeSH coverage (Lupinus). 


* Conclusions 


Doctors’ and lawyers’ question terminologies were not 
shown to differ significantly in correspondence to medical 
system terminology; this suggests there may be no need, 
when planning for direct user-retrieval system interaction, 
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to differentiate between “specialist” and “nonspecialist” 
terminologies when different client groups are trained to 
approximately the same level in their respective disci- 
plines. Oral and written questions were not shown to 
be significantly different in question components and 
hence system responsiveness; this casts doubt on the 
traditional need for question terminology negotiation. 
Specifications about searches were shown to be infrequent, 
which suggests the need for explicit methods of delimiting 
searches in user-retrieval system interaction in order to 
limit the size of the file to be searched. 

About 50% of the questions in this study were 100% 
translatable (M +S+ T); they could be transacted with 
no intermediary given the subject heading list, stop-list, 
and appropriate translation rules. About 80% of the 
average question was translatable. The untranslatable 
20% would require either a human intermediary or a 
complex machine program for terminology negotiation 
to fill the communication gap between user and retrieval 
system. 
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The Relationship of Natural and Social Sciences to Social 
Problems and the Contribution of the Information | 


Scientist to Their Solutions" 


Social problems have multiple causes, and their solu- 
tions accordingly require a multidiscipline approach, 
which is facilitated by the fact that technology, the 
natural sciences, and the social sciences are closely 
interrelated (a point of view that is making itself in- 
creasingly felt in educational theory). The deteriora- 
tion of the inner city is an example of a typical com- 
plex social problem that will yield only before such 
a unified attack. Solutions to social problems have 
been suggested by findings from such varied flelds 
as astrophysics, sensory psychophysics, and popula- 
tion studies, as well as the more theoretical social 
sciences, whose influence can be seen in their appli- 
cation to problems of urban development. Although 


* Introduction 


There has been a growing recognition by theoretical and 
applied social scientists that interdependence is a condi- 
tion of modern life. Accordingly, the major assumptions 
of this paper are: 


1. This interdependence extends beyond the social 
sciences and includes the physical and the biological 
sciences. 


2. Social problems result from a variety of causes, and 
to obtain real understanding, knowledge needs to be 
drawn from many sources. 


3. This knowledge is not restricted to the purview of 

a single discipline or field, nor can it be divided into 

mutually exclusive categories. 

The present paper will explore the premise that in- 
formation science and information specialists can make 
& major contribution to the solution of basic social 
problems by collecting and integrating pertinent knowl- 


* Presented at the Oentral Ohio Chapter, American Documentation 
Institute, Columbus, Ohio, April 21, 1966. 
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information specialists have hardly yet developed a 
full-fledged body of knowledge, they can contribute 
much towards the solving of social problems by: (1) 
organizing and disseminating information to those 
broad-gauged individuals and groups that are work- 
ing on problems that defy solution by a fragmented 
approach; (2) controlling the rapidly mushrooming 
body of pertinent technical literature; (3) develop- 
ing periodical indexes for the field of social welfare; 
and (4) assisting in the compilation of state-of-the- 
art papers on social problems, basing their work on 
a value-oriented extracting technique derived from 
a model created by the National Association of Social 
Workers. 


JOE R. HOFFER 


National Conference of Social Welfare 
Columbus, Ohio 


edge from the physical, biological, and social sciences, 
and by relating it directly to selected critical areas. These 
assumptions will be discussed under the following major 
headings: 

1. What are some of the basic social problems? 

2. What is the relationship of the physical and bio- 
logical sciences to the social sciences, and the relation 
of the three to social problems? 

3. What are some possible contributions of the infor- 
mation scientist ? 


9 What Are the Basic Social Problems? 


For our purposes, the basic social problems are those 
that are fundamental and universal—problems whch exist 
in many communities and countries. Just as different 
scientific and professional disciplines emphasize different 
aspects of the whole person to arrive at a “theory of 
man” (philosophical, theological, biological, psychological, 


psychoanalytical, sociological, etc.), it is understandable 
that these professions and scientific disciplines will em- 


| phasize different social problems. 


In the words of Arthur Blum, “Various authors have 
described us as a lonely crowd (1), growing up absurd 
(£), in an insane society (3), composed of status seekers 
(4), exurbanites (5), organization men (6), and the 
invisible poor (7). These writings describe systems of 
extensive distortions in our present social functionings. 
What is of additional interest about this collection of 


. books is the variety of backgrounds and disciplines rep- 
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resented by the authors, indicating increasing concern 

ith common problems within a number of different 
fields" (8). The monumental rise in psychoses, neuroses, 
alcoholism, crime, delinquency, poverty, divorcee, and 
suicide, to name only & few of our social problems, in- 
creasingly testifies to the extent and variety of the social 
pathology which confronts us today. 


. * What Is the Relationship of the Physical and 


Biological Sciences to the Social Sciences, and 
the Relationship of the Three to Social Prob. 


lems? 


The solution of the social problems of our society 
requires the productive interaction of many disciplines 
—the natural sciences, the social sciences, engineering, 
and management. 

The findings of the natural sciences have a profound 
effect upon social science, social welfare, and social 
problems. This is as it should be, for man himself was 
first scientifically investigated as a physical entity. Never- 
theless he ia a social being, and as such is the basic unit 
of concern in the study of collective behavior. 

Interrelationships are particularly easy to see m the 
influence of chemistry, pharmacology, and medicine upon 


' areas of concern to public health. Take venereal disease, 
‘for example, with its host of accompanying social com- 


plications. It can now be effectively brought under 
control thanks largely to a medical breakthrough, namely 
the discovery of antibiotics. An even vaster problem, 
mental and emotional illness, which could be said to 
affect directly or indirectly almost every area of social 
malfunctioning, now shows some sign of yielding to cer- 
tain chemical compounds known as tranquilizers and 
antidepressants. The application of DDT and similar 
insecticides has made possible the effective eradication 
of pest-borne diseases (such as malaria) which formerly 
were endemic to many areas of the world.* 

In another area, technology (the child of the natural 
sciences) has made possible, in countries such as ours, 
so great an abundance of the necessities of life and go 
great an increase in national resources that we now, 


1 However, the new insecticides are, in turn, producing problems, for 
the Department of Health, Education, and Welfare and the Department 
of Agriculture are both becoming quite concerned about the residual 


buildup of insecticides both in the sgoil and in ground water. 


in all seriousness, propose to abolish poverty, one of 
the most ancient and universal of our social problems. 

Nor ean the body of knowledge of the social sciences 
escape the impact of technology. Such an increase of 
wealth affects in many ways (both obvious and subtle) 
the very structure of society, thus providing new raw 
materials for the economist (who studies the implications 
of the distribution of this wealth), for the sociologist 
(who studies its effects on social class and stratification), 
and for the political scientist (who studies the significance 
of this social change for governmental institutions and 
processes). The findings of such inquiry is of profound 
significance to the theory of human collective behavior. 

If scientific breakthroughs can solve social problems, 
they can also create them. Progress in public health, 
by lowering the death rate, acts as a primary factor in 
the population explosion; and we are all too familiar with 
the social and political consequences of certam well- 
known discoveries in nuclear physics. The computer, 
with its impressive potential for the speedy and accurate 
storage, integration, control, dissemination, and imple- 
mentation of data may also turn out to be the villain 
in its forthcoming role as the boss of the automated 
factory, which some economists and labor leaders see 
as the root of large-scale future unemployment. Social 
welfare, and especially the field of recreation, view auto- 
mation as the source of a possible superabundance of 
leisure, which, for healthful living, they must help to 
fill with meaning. 


INTERDISCIPLINARY RELATIONSHIPS AND EDUCATION 


Regarding the relation of science to the social sphere, 
Derek J. deSolla Price (9) believes that both government 
and scientists are interested in ensuring that science be 
promoted for the good of society and the nation: “A 
scientist,” insists Price, “does not need political motiva- 
tion to be conscious of the social relations of science.” 

This interdependence is reflected in the field of educa- 
tion, whose leaders are aware that, as C. P. Snow has 
said, “the purely scientific education is incomplete, but a 
purely non-scientific education is also incomplete. (10).” 

The schools of the future will incorporate this concept 
into the curriculum. A recent article in the New York 
Times reports on a study recommending “that all stu- 
dents be offered a new set of general education courses 
‘in their senior year in which they would relate the liberal 
arts to such specific areas as urban renewal, the develop- 
ment of new states, the problems of the public bureauc- 
racy, and the philosophy of science . . . that every 
student, in addition, should be required to take a one- 
term course in economics, sociology, government, anthro- 
pology, or geography . . . that all students should take 
a two-year mathematics-physics, or mathematics-biology 
sequence (11)." 

It is also evident that our educational institutions 
must give greater attention to the developing of broad- 
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gauged leaders. In the words of John W. Gardner, Secre- 
tary of the Department of Health, Education, and Wel- 
fare, “Leadership is dispersed among a great Many groups 
in our society. . . . Nothing should be allowed to impair 
the effectiveness and independence of our specialized 
eadership groups. But such fragmented leadership does 
ereate certain problems. One of them is that it isn’t 
anybody’s business to think about the big questions that 
^ut across specialties—the largest questions facing our 
society (12)." 


WzuaT 18 A Socia PROBLEM? A TYPICAL EXAMPLE 


As one studies the current activity on a specific front— 
for example, urban problems—one is led, as Rouse has 
axpressed it, “to a depressing conclusion . . . there is 
absolutely no dialogue in the U. S. today between the 
people who have developed knowledge about people— 
the teachers, ministers, psychiatrists, sociologists—and 
the people who are designing and building our cities 
(13)." He beleves that we are not asking the right 
questions, and so we are not getting the right answers. 

The deterioration of the inner city is an example of 
a social problem condition that must be considered in 
urban development. The inner city refers to “a zone 
of land that circles the central business district of the 
metropolis, extending outward toward the city bound- 
ares (14)." 

Northwood (14) defines four major concepts in suff- 
elent detail to permit forward movement: social condi- 
tions, social problems, social work, and the inner city. In 
his analysis, he catalogs an impressive list of conditions 
associated with the core of the city under three major 
headings: (1) land use in the inner city; (2) people 
of the inner city; and (3) social control of the inner 
city. 

Northwood’s major assignment was to examine these 
conditions and to identify to what extent, traditionally 
and currently, organized social welfare has recognized 
them as social problems appropriate to its work. A 
secondary assignment was to assess the contribution of 
social welfare to the solution of these social problems. 

There is ample evidence that social welfare and the 
social work profession have attempted to ameliorate such 
social conditions and problems in a wide variety of 
specific programs and services over a long period of 
time. There is also ample evidence that these conditions 
and problems are persistent and hardy, and will require 
aew and revolutionary approaches and services. 

The evidence would favor, in Cohen’s words, “broad 
programs of social reconstruction as against specific pro- 
zrams for specific social problems. This economic and 
physical planning cannot be separated from social and 
psychological planning (16).” 


Tus CONCERN OF THE PHYSICAL SCIENCES 


There are some significant examples of the applica- 
tion to social problems of methods developed by the 
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physical sciences. The Pennsylvania Department of 
Public Welfare is working with Dr. Leon Stegg, astro- 
physicist and manager of the Space and Missile Tech- 
nology Center of the General Electric Company, with 
& view toward using the systems analysis approach to 
the solution of human problems, particularly in public 
welfare. Similar work has been done in California, and 
& bill has been introduced in Congress to give financial 
support to the states for further experimentation in this 
area. 

Methods of sensory psychophysies have been used to 
gauge the intensity of opinions and attitudes (17). Ex- 
periments in a dozen laboratories have shown how pro- 
cedures developed for the scaling of sensory attributes 
such as brightness and loudness can measure human 
reactions to many forms of nonmetric stimuli. Stevens 
tells how this procedure (called “magnitude extimation") 
has been used to assess the consensus concerning inten- 
sity or degree for such variables as strength of expressed 
attitudes, pleasantness of musical selections, seriousness 
of crimes, and other subjective dimensions for which the 
stimuli can be arrayed only on nonmetric or normal 
scales. 


Tun CoNcERN or THE BIOLOGICAL SCIENCES FOR 
SOCIAL PROBLEMS 


Guhl (18) in his article “Sociobiology and Man," re- 
views current knowledge of the biological basis of human 
sociality, and suggests that sociobiologists need to be 
aware of current information in the human social sciences. 

To be specific, let us turn to a realm of social problems 
in which changes in value orientation on the part of the 
populace have been occurring, namely family planning. 
As Dael Wolfle suggests, “The fundamental problem is 
people. Whatever we do to increase food supplies, con- 
serve water, improve land management, or curb pollution 
merely postpones for a few years the day of catastrophe 
unless. we stop increasing the number of hungry 
mouths. .. . (19)." Within the past few years remark- 
able progress has been made in this field, despite the in- 
tense values of what might have been a majority of the 
population in the beginning. Writing in the New York 
Times, Ambassador Bowles states: 


India has embarked upon the world’s most ambitious 
control program. A program which one Indian official 
describes is at the very center of planned development. 
The number of family planning clinics, which totaled 
144 seven years ago, has increased to 8, 504—most of 
them in rural areas. By 1966, some 4,000 new clinics 
will have been added. 20 million posters and 60 million 
pamphlets have been distributed on the subject of 
family planning. Moreover, there is some evidence, 
admittedly uncertain, that the effort is already showing 
results. National birthrates, which averaged 48 per 
thousand in 1951, have dropped to 40. Moreover, in 
Bombay, where a vigorous program has been in effect 
for several years, the rate is down to 28 per thousand. 
In rural areas, intensive pilot programs have resulted 
in a decline in the birthrate by as much as 30%. 
Indian planners and U. 8. advisers, working through 


E — — —— U U U... _ 


the Ford Foundation, feel there is a reasonable chance 
that India’s population may begin to balance out in 
the 1970’s, with a birthrate of 25 per thousand and 
a death rate of 20 (20).” 


A dramatic illustration of the implications of the 
population growth in urban areas is an estimate, made 
y the United States Municipal News and based on U. S. 
onditions, which indicates that "Every 1000 new people 
in the metropolitan area require: 4.8 elementary school 
‘rooms; 3.6 high school rooms; 8.8 acres of land for 
schools, parks, and play areas; an additional 100,000 
gallons of water per day; 1.8 new policemen; 1.5 new 
firemen; 1 additional hospital bed; 1000 new library 
books; a fraction of a jail cell; sewage and treatment for 
170 pounds of organic water pollutants per day (£21)." 


THe Social Sciences: Torr RELATIONS WITH THB 
NATURAL SCIENCES AND THEIR Concern FOR SOCIAL 
PROBLEMS 


—-"Faleott Parsons (#2) provides some basis for analyzing 

' interrelations among the various disciplines by examin- 
' ing the formal preparation of social scientists in fields 
other than their own. He suggests that there is & his- 
torical and logical basis for dividing intellectual disciplines 
' Into the natural sciences, the social sciences, and the 
humanities. By these gross measures, the social sciences 
are largely independent of the natural sciences and 
mathematics, but have closer ties to the arts, the pro- 
fessions, and the humanities. 

An excellent example of the relationship of the social 
sciences to a key social problem area is found in the 
¡ field of urban development, to which we referred earlier 
' in this section. Lawrence K. Northwood believes that 
“most information about the city and its subareas is 
. found in studies of the social scientists. Such studies 
‘ vary widely among the academic disciplines. The éco- 
nomic geographers originally stressed the physical habitat 
of man, but have moved over to cite human ecology; 
the anthropologists visited agrarian communities over- 
seas, but now probe into industrialization and urbaniza- 
‘tion, no matter where their locus may be; political scien- 
tists have tended to examine the formal and informal 
boundaries of power and its social structures, investigat- 
ing international, state, and local questions; rural sociolo- 


2 As for the increasing role of mathematics in social sciences, Abraham 
Kaplan writes that ‘‘A troubling question for those of us committed to 
the widest application of intelligence Jn the study and solution of the 
. problems of men is whether a general understanding of the social sciences 
* wil be possible much longer. Many significant areas of these disciplines 
have already been removed by the advances of the pest two decades 
beyond the reach of anyone who does not know mathematics; and the 
man of letitera is increasingly finding, to hls dismay, that the study of 
mankind proper is passing from his hands to those of technicians and 
specialista . . ." (93). ` 

This suggests thet to attain mathematical competence the social 
sciences should consider a marriage with mathematics rather than some 
other less systematic arrangement, The American Soctologist, in its 
November 1965 jesue, reports that “The application of mathematics and 
logic in sociology appears to have reached the point that active research 
and teaching require some background in mathematics, and specialized 
positions in sociology departments are needed which only can be filled by 
persons with mathematical training" (#4). 


gists and adult educators traditionally have been con- 
cerned with small-town social action systems (25).” 

As for another example, social welfare and the social 
work profession draw heavily upon other fields and pro- 
fessions. To quote Taber and Shapiro- “evidence of 
borrowing knowledge from other fields was found in the 
use of recognized authorities, concepts, and theories for 
exposition or interpretation (26)." To quote Ruth Butler, 
“It is evident that collaborative work with related 
sciences and professions will be needed to secure the 
content required for understanding of each knowledge 
area recommended as a desirable objective (27).” 


e What Contribution Can the Information Scientist 
Make to the Solution of Social Problems? 


Although there is little evidence that information 
science has achieved the full status of a profession, the 
growing literature on information and its documentation 
suggests that this goal is not too far off. It appears that 
a recognizable common body of special knowledge will 
evolve from such present separate and independent 
activities as library science, documentation, information 
storage and retrieval, linguistics, machine translation, and 
information systems engineering. Furthermore, the lit- 
erature and the growing complexities of dealing ade- 
quately with knowledge leave little doubt that a second 
requirement of a profession will be met, namely, a 
recognized task for its members to perform. 

Mooers (28), Cuadra (29), Heilprin (30), Slamecka 
(31), Kent ($2), and Crosland (83) have analyzed the 
development of this new profession and have attempted 
tentative definitions. Perhaps the most useful one for 
our purposes was reported by Heilprin, namely “the 
science that investigates the properties and behavior of 
information, the forces governing the flow of informa- 
tion, and the means of processing information for 
optimum accessibility and usability. The processes in- 
clude the origination, dissemination, collection, organi- 
zation, storage, retrieval, interpretation, and use of 
information. The field is derived from or related to 
mathematics, logic, linguistics, psychology, technology, 
operations research, the graphic arts, communications, 
library science, management, and some other fields ($4) ."* 

Since the focus of this paper is on the possible con- 
tribution of the information scientist to the solution of 
social problems, it is necessary to provide an operational 


definition for the practitioner in this profession. The 


information scientist has been defined as follows: “One 
who studies and develops the science of information 
storage and retrieval, and who devises new approaches 
to the information problem, who is interested in informa- 
tion in and of itself (34)." 

It ig my conviction that an information scientist with 
an orientation in the physical, biological, and social 
sciences can make a vital contribution to researchers, 
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administrators, and policy groups in solving our pressing 
social problems. 


Tan CONTROL or TECHNICAL LITERATURE 


An interesting presentation of the acceleration of our 
time has been prepared by J. Lewis Powell (85). He 
compresses the 50,000 years of mankind’s recorded his- 
tory into 50 years, and on this basis develops the following 
chronology: 


1. Ten years ago, man left his cave for some other 
kind of dwelling. 
2. Five years ago, some genius invented the first us 


ing. 

3. Two years ago, Christianity appeared. 

4. Fifteen months ago, Guaba developed the print- 
ing press. 

5. Ten days ago, electricity was discovered. 

6. Yesterday morning, the airplane was invented. 

7. Last night, radio. 

8. This morning, television. 

9. The jet airplane was invented less than a minute ago. 


We might add, earth-orbiting occurred 30 seconds ago, 
a, moon shot may occur within the hour, and interplane- 
tary vacations may occur tomorrow. 

Consider this time-table in relation to the following 
definition of science, which identifies it totally with its 
documentary output: “Science is that which is published 
in scientific journals, papers, reports, and books. In 
short, it is that which is embodied in the literature (9). 
These facts, together with the simple observation that 
recorded knowledge accumulates through the years, 
whereas the rate at which it can be read by any person 
remains constant, have profound implications, not only 
to scientists and administrators, but to information 
specialists, namely: that the technical literature of all 
sciences and professions is accumulating at a rate which 


has been compared to that of geometric progression, and - 


that the heavy burden of introducing order into this 
potential chaos lies squarely upon the shoulders of 
librarians, documentalists, and information specialists. 


GRNERALIZATION VERSUS SPECIALIZATION: THE ROLE OF 
THE INFORMATION SCIENTIST 


The increasing volume of publications means inevitably 
«that the expert tends to confine his attention to an 
ever-narrowing area, while the literature tends to be- 
come more and more scattered. K. Wiliam Kapp, for 
example, in his book entitled Toward a Science of 
Man in Society, highhghts the problems of generaliza- 
tion and specialization: “Systematic scientific inquiry in 
the social sciences today is marked by a curious contradic- 
tion. On one hand, we are witnessing a rising demand 
for intellectual cooperation and integration, which finds 
expression in various interdisciplinary endeavors and 
cooperative. ventures by scholars from different disci- 
plines; on the other hand, the traditional compartmen- 
talization of the social sciences has continued, and is 
vigorously defended on the ground that specialization is 
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the prerequisite for all creative work in scientific inquiry, 
as indeed in all fields of human endeavor (36).” 

The suggestion that we combine the various sciences 
into a single body of knowledge focused on a selected 
social problem raises some real difficulties for the 
information specialist, namely that it appears evident 
that there is a need for a form of common integration, 
semiautomatic in nature, that will cross the various dis- 
ciplines and feed its findings back to the individual re- 
searcher, There are two principal obstacles that must 
be overcome before such a common integration can be 


produced. One is the increasing volume of publication” 


and the other is the present incompatibility of the 
physical, biological, and social sciences.? 

Information science can make a definite and related 
contribution by organizing and disseminating information 
to those broad-gauged and generalist individuals and 
groups that are working on social problems that defy 
solution by a fragmented technique or approach. 


Some Specific CONTRIBUTIONS TOWARDS THE SOLUTION” 


or SOCIAL PROBLEMS 


The general function of the information specialist is 
the organization of the literature, a function which can 
act as a force toward integration and synthesis. How- 
ever, one concrete and major contribution information 
specialists and librarians could make toward the resolving 
of social problems would be to strengthen the basic 
bibliographie tools of the professions which must cope 
with them, A student, researcher, practitioner, or ad- 
ministrator who wishes to locate articles in which he is 
interested might have to consult as many as 16 different 
indexes and indexed abstract journals, none of which 
really specializes in social welfare, or claims to list exten- 
sively papers dealing with it. 

Another specific contribution that information spe- 
eialist can make is to assist researchers in compiling 
state-of-the-art papers on selected -social problems. To 
increase the usefulness of available knowledge, & model 
($8) developed by the National Association of Social 
Workers is suggested. In brief, the model is value ori- 
ented, and emphasizes the gaps between the ideal ob- 
jectives and the actual operations. 

'The model stresses the following major considerations: 


1. Definition and etiology of the problem. 

2. The societal norms and values, and the assumed 
scientific and professional norms and values affecting 
the problem. 

3. The current programs actually dealing with the 
problem, and the consequence of-continuing these pro- 


grams. . 2 * . 
4. 'The ideal or the social change objective. 


8 As for the latter, social science data do not resemble physical and 
biological data sufficiently to be comparable to them. Foskett suggests 
that social scientists have approached modern retrieval systems with 
caution because librarians have assumed that there is little difference in 
these various data. He stresses the fact that '"what distinguishes the 
social sciences, perhaps, is the extent to which subjective attitudes and 
imprecise terminology appear in the literature, and the masterful manner 
in which some scholars dispose of thelr opponents" ($7). 


| 


| 5. The relationship between the actual and the ideal: 

, Identification of the gap between them, sources of 
| resistance to, and support of, the closing "of this gap, 
| action priorities for scientific and professional groups, 
| and theory and research needs for attaining necessary 
| , knowledge and programs. 


As for the application of this technique to the litera- 
ture of a specific social problem, one would first, of 
course, have to assemble the pertinent documents. Be- 

use of the multifaceted quality typical of most social 
problems, an information specialist or librarian conduct- 
ing such a literature search should be able to bring 

wide spread of materials, from many sources and 

rofessions, to focus on his topic. After a working 
bibliography has been compiled, he might proceed to 
apply the model by extracting key sentences and para- 
graphs according to its five rubrics. Not every article 
would have material to match each breakdown, but 

hegative results would be as meaningful in their way 
as positive ones. The researcher—sociologist or social 
worker—would now be ready to make the final selection 
pnd to write the article. 
° Conclusion 
. The achievement of the goal of interdependence is 
complicated by the fact that each profession and field 
has maintained that specialization is the prerequisite 
for all creative work in all human endeavor. However, 

here is a rising demand for intellectual cooperation and 

integration, a demand which finds expression in various 
interdisciplinary enterprises and cooperative ventures by 
gw 

The last 10 years have brought a new expertise to 
the administration and nourishing of science and tech- 
inology for social goals. Information specialists can relate 
this new body of knowledge to the specialized information 
‘and experience which have accumulated over the years. 
The net result wil provide a rich cross-fertilization 
among the various fields of scholarship and between each 
lof them and the relevant areas of expert organizational 
Saha 
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The Nuclear Safety Information Center serves the nu- 
clear community by collecting, storing, evaluating, 
and disseminating safety information relevant to the 
design and operation of nuclear facilities. In 1964, 
after about a year of operation, the information- 
handling system was computerized in order to in- 
crease broadly the scope of the Center's services and 
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| The USAEC established the Nuclear Safety Informa- 
tion Center (NSIC) at Oak Ridge Nationàl Laboratory 
(ORNL) in March 1963. The Center serves the nu- 
clear community by collecting, storing, evaluating, and 
disseminating safety information relevant to the design 
and operation of nuclear facilities (1). It was in opera- 
tion almost immediately after its establishment because 
the scientists and engineers necessary to the operation of 
1 center, and without which an information center is 
hardly more than a specialized library, were already on 

e ORNL staff or available through existing consulting 
PRU 

The subject of nuclear safety was divided into 19 
— such as Accident Analysis, and technical per- 
sonnel were assigned on fractional-time basis to study 
these categories, prepare review articles and reports, 
answer inquiries, and catalog information. Each reference 
reviewed by the specialists is indexed according to a 
system of key words developed by the staff. The key 
words, title, author, corporate author, and abstract for 
each document were initially recorded on 5x8 in. cards 

d duplicate cards were filed under each key word, 

lauthor, and corporate author. 
| This system enabled the Center to get into operation 
¡immediately and was quite workable as long as informa- 
ition items were not too numerous. After about a year, 
| l 


i 
| * Research sponsored by the U. 8, Atomic Energy eie under 
contract with the Union Carbide Corporation. 


nalysis and Automated Handling of Technical Information 
the Nuclear Safety Information Center" 


enable efficient functioning in the future. Computer 
programs were developed for the preparation of a 
bibliography, complete with key-words and personal 
author indexes, that is issued quarterly and for a 
program of selective dissemination of information 
(SDI) that is produced on 5X8 in. cards. These pro- 
grams and other services of the Center are discussed. 


J. R. BUCHANAN and F. C. HUTTON 


Nuclear Safety Information Center, 
Oak Ridge National Laboratory 
and 

Computing Technology Center, 
Union Carbide Corporation 


- however, it was decided to computerize the system for 


use of the IBM 7090s at the Computing Technology 
Center* in order to broadly increase the scope of the 
Center’s services and prepare for future growth without 
burdening the technical staff with the routine. Two 
computer outputs were initially planned: the first was 
a bibliography, complete with key word and personal 
author indexes, to be issued quarterly; and the second 
was output in the form of cards for a program of selec- 
tive dissemination of information (SDI). Both of these 
outputs are now in operation. 

The development of these programs, particularly the 
SDI, is described and the range of services and or- 
ganization of the Center are discussed in this review. 


* Computer Program Development 


. A prime consideration in developing the computer 
programs was to keep the system flexible enough to 
permit growth of NSIC. operation and to make it feasible 
to extend the system to the work of other information 
centers without mar modifieations to the programs. 


1 The Computing Technology Center, which operates computers and does 
problem analysis and programming, is opersted by the Union Carbide 
Corporation Nuclear Division for the U.B. Atomic Energy Commission at 
the Oak Ridge Gaseous Diffusion Plant, F. O. Hutton supervises the 
Information Retrieval Section of the Information Systems Department 
of tha OTO. 
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Both these requirements dictated that the programs be 
fast-and that the capacities of the programs be large. 
Speed was obtamed by writing the highly repetitive 
paris of the programs in symbolic language subroutines 
usable by the programs, which are written in COBOL. 
Large capacity was obtained by always being attentive 
to the amounts of computer core storage required by 
different techniques. To date, four other centers have 
used the basic programs. 

Four programs make up the system; one will form 
and update a Master Tape; one will select items from the 
tape and prepare a bibliography; and one will search 
the tape in response to questions. The fourth program 
is used to maintain the key-word file that appears on 
the front of the Master Tape. 

The following information appears on the computer 
tapes, which are organized in linear fashion so that every- 
thing concerning one item on the tape appears together 


serially; the asterisk indicates that the element is search- - 


able or can be discriminated at this time: 


*]. Type, such as reports, journal articles, etc. 
+2, Evaluation of contents (as to pertinency) 
*3. Category (such as Accident Analysis) 

*4 Journal abbreviation (ASTM's Codes) 

*5. Date. 

*6. Language 

*7. Country 

*8. Corporate author 

*9. Personal author(s) 
*10. Title 

11. Description, such as pages, figures, tables 
12. Abstract 
*13. Key words 


Key words are weighted on searching, with an ac- 
ceptable total weight being specified, and negative 
weights are permitted. Search elements can be con- 
nected on an AND/OR basis. Ánother form of weight- 
ing is used when the references are indexed by assign- 
ing an asterisk to the key words of primary importance 
in each document. These asterisks show in the various 
printed outputs of the Center. 


e SDI Profiling and Operation 


In the SDI program, NSIC sends out abstract cards 
to over 800 members of the nuclear community ac- 
cording to their individual interests. The program was 
inaugurated in October 1965 on a pilot scale. It gained 
such an enthusiastic reception from. the initial partic- 
ipants that a decision to expand the program was made 
soon thereafter. Since. early in 1966, additions to the 
program have been made on a routine basis. This is 
expected to continue for some time.. 

In setting up the individual profiles, several manage- 
ment level individuals were initially invited to define 
their specific interests in the field of nuclear safety. Some 
50 of the group of 80 responded with the needed in- 
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formation. A profile of interest was then constructed 
for each participant by assigning the appropriate key 
words from our indexing vocabulary in a fashion similar 
to that used in indexing the subject content of a docu- 
ment. Structuring was done in such a fashion as to give 
a small fraction of false drops and not have the profile 
drawn so compactly as to eliminate references of true 
interest. i 

Each key word was weighted and a target score was 
assigned to each profile as shown in Table 1. Negative 
weights were permitted and provided a powerful tool to 
apply when developing search strategies. (One use will 
be discussed later in the section on profile adjustment.) 
À bibliographie accession on the computer tapes must 
equal or exceed the target score before the item is con- 
sidered to meet the search parameters. An individual 
with varied interests was assigned more than one profile. 
Subject categories were then assigned to the profiles 
where appropriate. This limits the amount of Master 
Tape to be searched and conserves computer time. 

NSIC’s computer .storage files are updated approxi- 
mately every 2 weeks with the latest accessions. The 
SDI profiles are matched against the new material during 
this process. The computer seans its memory, checks each 
accession, and assigns the corresponding word weights to 
any of the key words that match those on the interest 
profile. When the total weight is equal to or greater 
than the target score, the computer prints the title, 
author, abstract, and key words on one of the specially 
prepared continuous-form 5x8 in. cards. The cards are 
folded in accordion style with the address card on top 
so that a package can be mailed to a participant rather 
than several individual cards. A portion of a typical 
SDI output is shown in Fig. 1. The first section. depicts 
the preaddressed reverse of the card form. Also shown 
are an address card, a typical accession card, and a 
feedback card. 


Taste 1. Typical profile of an SDI recipient 


Assigned 
Key words weight 


Safety system 2 
Safety review (operations, experi- 
ments) 
Reliability analysis 
Reliability system 
Operations report analysis 
4 Hazards analysis 
Inspection and compliance 
Administration, control, practices 
Operating limits/technical specification 
Accident probability 
Staffing, training, qualification 
Radiochemical plant safety 
Radiochemieal processing 
Categories 1, 5, 17, 18 


Target score 
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The special card stock is preprinted for the Center at 
a cost of about 1é per card. The computer sorting and 
printing costs and mailing costs increase the expense per 
card to about 56. 


* SDI Feedback and Adjustment 


Feedback from the SDI recipients makes possible ad- 
justments to individual profiles to reduce the quantity of 
citations of no interest. The overall feedback response 
has been very good; in fact, cards have been returned 
by over 80% of the participants. When analysis of the 
returns from a participant indicates that relevancy can 
be substantially improved, his profile is reviewed and 
efforts are made to make it more responsive to his needs. 
Physicist A will be used to illustrate how this was done. 
Initially this scientist was profiled to receive all of cate- 
gory 6 on “Reactor Transients, Kinetics, and Stability.” 
Tabulation of the returns from the first feedback cards 
was as follows for 146 citations: 


Documents of much interest 32% 
Documents of some interest 17% 
Documents of no interest 51% 


Category 6 covers both analytical and experimental 
studies of reactors and critical facilities. An examination 
of the key words assigned to the documents that were of 
no interest to Physicist A revealed that many of them 
were concerned with experimental determinations or with 
critical assemblies. As a result, the profile was altered 
so that reports with either of the key words, "measure- 
ment, general,” or “criticality safety,’ would not be 
cited. This was done by giving these two terms weights 
of “—1” with the target score held at “0” so that all 
other references in Category 6 would continue to be 
selected. Results of the next four feedback cards re- 
vealed some improvement in the pertinence, with a total 
of 56 citations breaking down as follows: 


Documents of much interest 41% 
Documents of some Interest 25% 
Documents of no interest 34% 


Since there was room for further improvement, the 
next step taken was to communicate directly with Phyg- 
cist A for more insight into his actual interest. This 
disclosed that he wanted all references in Category 6 
that entail the use of reactor transient parameters but 
that he was not interested in the determination of these 
parameters. This more explicitly confirmed our analysis 
of the early feedback cards and suggested further profile 
alterations. With this information and further study of 
the key. words assigned to the irrelevant documents, a 
longer list of key words was developed for negation. 
The following were allotted weights of “—-1” and assigned 
to the group of two terms already designated: 

Analytical model 

Critical experiment 

Doppler effect 
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Flooding coefficient 
Metal water reaction 
Shock wave 
Temperature coefficient 
Void coefficient 


After this change was made, 
feedback cards from Physicist A were reviewed. A 


total of 20 references was cited with the following 
results: 


Documents of much interest 65% 
Documents of some interest 25% 
Documents of no interest 10% . 


This example illustrates clearly how individual profiles 
can be adjusted to close in on the user's actual needs. 
Further adjustments of this partieular profile are prob- 
ably not warranted since it has reached a range of “no 
interest” below which it is difficult to make significant 
progress. Indeed, we feel that it is probably undesirable 
to reduce the irrelevant figure below 10%. To do this 
would increase the likelihood that some relevant docu- 
ments would not be cited at all. Therefore, we have 
concluded that an SDI profile is functioning satisfactonly 
if the number of irrelevant items cited falls in the range 
10 to 20%. 


* Quarterly Bibliography 


The first computer output that NSIC distributed was 
the quarterly indexed bibliography. It was issued in - 
April 1965 and contained the first 670 references that 
were stored on the computer tapes. The references were 
sorted into the 19 categories listed below into which the 
subject of nuclear safety has been divided. 


. General safety criteria 
. Siting of nuclear facilities 
. Transportation and handling of radioactive materials 
. Aerospace safety 
. Accident analysis 7 
. Reactor transients, kinetics, and stability 
. Fission product release, transport, and removal 
. Sources of energy release under accident conditions 
. Nuclear instrumentation, control, and safety systems 
10. Electrical power systems 
11. Containment of nuclear facilities 
12. Plant safety features 
13. Radiochemical plant safety f 
14. Radionuclide release and movement in the environ- 
ment 
15. Environmental surveys, monitoring and radiation 
exposure of man 
16. Meteorological considerations f 
17. Operational safety and experience 
18. Safety analysis and design reports 
19. Bibliographies 
When the references are indexed, each is assigned to 
at least one of the categories, but it may be assigned to 
as many as three, if appropriate. In the latter case, the 
accession information is printed in full in each of the 
categories rather than being cross referenced. The in- 
formation for each accession is the same as that dis- 
played on the typical SDI accession card in Fig. 1. 
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| The bibliography program prints the categorized ac- 


cessions, key words, and personal author indexes on 14 
X 17 in. computer sheets. These sheets are taken two at 
a time and photographically reduced by about 40% to 
give 8 x 11 in. negatives. The pages for the bibliography 
are then printed from these negatives. Since the pages 
¿re numbered by the computer, and the title page, fore- 
word, and distribution, etc., are printed by the computer, 
no editorial or graphic arts work is required in the 
l process. The last issue of the quarterly bibliography 
ntained about 300 pages and -1,000 references. Over 
000 copies of each issue of the bibliography are distrib- 
by NSIC. 
! Computer Volumes and Running Times 
| , Át present there are over 10,700 items on the Master 
Tape, and the vocabulary contains almost 1,000 key 
¡words (2). It takes around 12 min of IBM 7090 time to 
jenter 140 items on the Master Tape. Each key word 
lassigned to an item is checked against the vocabulary 
authority to make sure that it is an authorized key 
¡word. New key words must be so designated in order 
Ito be added to the authority. ` 
| SDI. The number of scientists receiving SDI service 
thas grown from 50 to over 800 since the program started 
lin October 1965. A scientist may have more than one 
!profile of interest, and these profiles now exceed 1,000. 
All together, the profiles query 183 authors, 93 corporate 
jauthors, 2,375 categories of information, and 4,821 key 
| words. One scientist has 9 profiles querying 20 categories 
| and 310 key words. 
| All question information is held in the computer core, 
| and each item from the Master Tape is worked against 
lit. One item may drop for several participants, and a 
subsequent sort is used to pull together all drops for one 
| participant. At present it is possible to process some 200 
| profiles with only one reading of the Master Tape. The 
| tape is then rewound, and the next group of profiles is 
| processed similarly. 
| Total computer time for an SDI run that dropped 375 
| items for 165 profiles was around 15 min. Of this 15 
i min, the actual questioning and selecting required only 
| 2 min. The rest of the time was taken up in sorting, 
| formatting for output, etc. | 
| Bibliography. Ten quarterly bibliographies have been 
| issued, each containing 600 to 1,000 references. The last 
! issue required 24 min of IBM-7090 time for sorting 
| into the NSIC categories and preparation of the key 
| word and personal author indexes. 
| 
| 
| 
| 
| 
| 


* Principal NSIC Services 


A variety of informational services is offered by NSIC 


| to the nuclear community. The principal ones are sum- 
| manzed below. The computer outputs discussed in more 


| 
| 


detail above are included here for completeness. The 
Nuclear Safety Journal is available by. subscription only. 
All other NSIC services, including state-of-the-art re- 
ports, may be obtained from the Center without charge 
by persons who are active in the nuclear field. Those who 
do not fall in this category may purchase such reports 
for a nominal sum from the Clearinghouse for Federal 
Scientific and Technical Information, Springfield, Virginia. 

State-of-the-Art Reports. These reports provide a 
mechanism for the individual staff members to analyze 
and evaluate the experimental and theoretical data de- 
veloped in their particular subject area. They are very 
comprehensive and require several man-months of tech- 
nical effort to produce. Subjects that have been or are 
currently being covered include: (1) iodine monitoring 
practices (3), (2) iodine behavior in reactor containment 
systems (4), (3) reactor containment practices (5), (4) 
reactor secondary shutdown systems (6) (5) nuclear 
safety research and development projects (7), (6) US. 
nuclear standards (8, 9), (7) air-cleaning systems (10), 
(8) reactor pressure vessel integrity (11), (9) reactor 
operating experiences (12), (10) international nuclear 
standards (13), (11) height of rise of effluents, and (12) 
tritium behavior. 

Nuclear Safety. The technical progress review Nuclear 
Safety, while separately funded, is prepared by NSIC. 
Recent developments in nuclear safety are concisely 
reviewed as prevailing interest and available information 


warrant (14). The journal may be purchased by subscrip- 


tion from the Superintendent of Documents, US. Govern- 
ment Printing Office, Washington, D.C. 20402, for $3.00 
per year. 

Indexed Bibliography of Accessions. The bibliographic 
accessions of the Center are published quarterly. This 
computer output is sorted according to the 19 NSIC 
categories and issued with key words and personal author 
indexes. So far, 10 have been published (16-24). A 
bibliography of all the accessions in Category 3 has also 
been published (26). 

Selective Dissemination of Information. Bibliographic . 
citations on continuous 5 x 8 in. cards are mailed to 
participants on a biweekly basis. The citations are 
selected according to each participant’s profile of interest 
as the computer tapes are updated. The operation of 
this important program is described above in some detail. 

Technical Inquiries. Inquiries for nuclear information 
are currently being received and answered, free of charge, 
by telephone, letter, or personal contact at a rate of 
40 per month. This is more than double the rate at 
which requests were received in 1965. ` 

Counseling and Guidance. The NSIC staff is available 
to visitors for counseling and guidance on nuclear safety 
problems in its subject areas, Visits to the Center for 
the purpose of consulting with the technical staff or 
using the information storage files occur at a rate of about 
12 per month. 

Special Bibliographies. The computer files may be 
queried for special retrospective searches at any time. 
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The output is à bibliography appropriate to the particular 
search combination of key words, authors, date, and/or 
evaluation. 'The information printed in the output is 
the same as that for the SDI; that is, the same as that 
displayed on the accession card in Fig. 1.? 


* Organizational Structure 


The Nuclear Safety Information Center is funded 
through the office of the Assistant Director for Nuclear 
Safety, Dr. J. A. Lieberman, in the AEC Division of 
Reactor Development and Technology. The AEC Divi- 
sion of Technical Information collaborates in sponsorship 
of the Center. The Center is organized at the laboratory 
as a semi-autonomous group, responsible to an Assistant 
Laboratory Director, who is in turn responsible for all 
informational problems; but it is administered within 
the ORNL Reactor Division. At the present time the 
staff is composed of 20 technical specialists (most of 
whom. are on a part-time basis), a technical editor, two 
information specialists, and four secretary-typists as given 
in Table 2. A staff scientist or engineer is assigned to each 


2 Requests for NSIC services or additional information on these services 
may be addressed to: Nuclear Safety Information Center, Oak Ridge 


, National Laboratory, P.O. Box Y, Oak Ridge, Tennessee 37880. 


'TABLE 2. Organization of the NSIC 





— Wm. B. Cottrell, Director 

Joel R. Buchanan, Assistant Director 

Jeanne Thomas, Secretary 

H. B. Whetsel, Technical Editor 

Celia Murphy, Chief Information Specialist 
Bhirley Hendrix, Information Specialist (part-time) 
. Ann Hayes, Information Specialist and Secretary 
Janet Davis, Secretary 

Dianne Lane, Secretary 


Staff Scientists and Engineers 


J. P. Blakely, Physical Chemist 

J. R. Buchanan, Nuclear Engineer 
Wm. B. Cottrell, Nuclear Engineer 
E. N. Cramer, Nuclear Éngineer 

. K. Ergen, Physicist 

. H. Fontana, Nuclear Engineer 
. D. Swisher, Meteorologist 

. G. Jacobs, Soil Chemist 

. W. Keilholtz, Physical Chemist 
. G. Lawson, Nuclear Engineer 

. F. Lomenick, Geologist 

. C. McClain, Geophysicist 

. A. MeLain, Nuclear Engineer 

. G. Merkle, Mechanical Engineer 
. B. Piper, Nuclear Engineer 

. B. Ruch, Chemical Engineer 

. L. Scott, Physicist 

. B. Shappert, Nuclear Engineer 

. S. Walker, Electrical Engineer 

. L. Winton, Nuclear Engineer 
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subject area. 


-of ORNL's nuclear safety research and develop 









of categories 6, 7, 9, 11, 14, 15, 16, and 17 on a half-tj 
basis. Surveillance in the other areas is on a lesser | 


~The director of the information center is coordi 


are the backbone of the NSIC. In fact, as it was|so 
aptly expressed in the “Weinberg Report” in 1963, “ 
essence of a good technical information center is tha 
it be operated by highly competent working scienti 
and engineers—people who see in the operation of 
Center an opportunity to advance and deepen their o 
personal contact with their science and technology (25). 


e Future Plans 


Just recently, NSIC received delivery on the equip- 
ment for a computer telecommunications system, IBM- 
2260, which will be located at NSIC and wil have 
capability for (1) direct data input to the computer, 
scanning and querying of the computer data files, 
(3) direct maintenance of the computer files. Info 
tion may be displayed on cathode-ray screens for 
ning and modification. The devices will be coupled 
the IBM-360 at the Computing Technology Center. 
All these information-processing techniques represent sig- 
nificant improvements in NSIC's information system and 
will enable it to efficiently function as part of the national 
information system of the future that will encompass 
science and technology. 


s pe 


References 


1. BucHANAN, J. R, and W. B. CortemL, Operating EL 
perience of the Nuclear Safety Information Cen 
March 1968-March 1866, USAEC Rep. ORNL-T 
1136, Oak Ridge National Laboratory, Oak Ridge, 
Tenn., May 17, 1965. 

2. NSIC Key-word Thesaurus, USAEC Rep. ORNL- 
NSIC-35, Oak Ridge National Laboratory, Oak 
Ridge, Tenn. August 1967. 

3. ‘Cowsm, K. E., Current Practices in the Release and ` 
Monstoring of “I at NRTS, Hanford, Savannah 
River and ORNL, USAEC Rep. ORNI-NSIC-3, Oak 
Ridge National Laboratory, Oak Ridge, Tenn. 
August 1064. 

4. Kenno, G. W., and C. J. Barton, Behavior of Iodine . 
in Reactor Containment Systems, USAEC Rep. 


ORNL-NSIC-4, Oak Ridge National Laboratory, — 


5. CorrgELL, W. B., and A. W. SavoLArNeN, U.S. Reactor 
| Containment Technology ——A Compilation of Current 
, Practice in Analysis, Design, Construciton,. Test, and 


Operation, Vol. I and II, USAEC Rep. ORNL-NSIC- . 


¡ 5, Oak Ridge National Laboratory, Oak Ridge, Tenn., 
' August 1965. 
|. WALKER, C. 8., Secondary Shutdown Systeme of Nuclear 
Power Plants, USAEC Rep. ORNL-NSIC-7, Oak 
Ridge National Laboratory, Oak Ridge, Tenn., Janu- 
ary 1966. 
7||Bucwanan, J. R., and N. F. Cross, Current Nuclear 
Safety Deseut and Development Projects, USAEC 
Rep. ORNL-NSIC-10, Oak Ridge National Labora- 
| toy, Oak Ridge, Tenn. June 1966. 
. Corre, W. B., and ASA Suscommirrmz N69, Com- 
lation of US. Nuclear Standards, 2nd ed., 19665, 
, USAEC Rep. ORNL-NSIC-11, Oak Ridge Nora! 
, Laboratory, Oak Ridge, Tenn., December 1965. 

. COTTRELL, W. B., Compilation of "United States Nuclear 
| ; Standards, 3rd ed., USAEC Rep. ORNL-NSIC-30, 
Oak Ridge National Laboratory, Oak Ridge, Tenn., 
| December 1966. 

10. KeHoitz, G. W., Air Cleaning € 
Processes, Testing, and Nuclear Applications, USAEC 

| Rep. ORNL-NSIC-13, Oak Ridge National Labora- 

i tory, Oak Ridge, Tenn., September 1966. 

11. Muiar, E. C, The Integrity of Reactor Pressure 

Vessels, USAEC Rep. ORNL-NSIC-15, Oak Ridge 

| National Laboratory, Oak Ridge, Tenn. May 1966. 
| US. Avomic ENERGY COMMISSION DIVISION OF OPIRA- 
| ‘TIONAL Sarery, Abnormal Reactor Operating Experi- 


| Oak Ridge, Tenn., February 1965. 
| 


I 
12 
ences, USAEC Rep. ORNL-NSIC-17, Oak Ridge Na- 
tional Laboratory, Oak Ridge, Tenn., August 1988. 
13. Correm, W. B., and ASA Suscommirrze N6.9, Com- 
pilation of National and International Nuclear 
Standards (Excluding U.S. Activities), 2nd ed., 1908. 
USAEC Rep. ORNL-NSIC-18, Oak Ridge National 
; Laboratory, Oak Ridge, Tenn., June 1966. 
14. Index to Nuclear Safety A Technical Progress Review 
| By Chronology, Permuted Title, and Author Vol. 1, 
No. 1 Through Vol. 7, No. 4, USAEC Report ORNL- 
|  NSIC-31, Oak Ridge National Laboratory, Oak 
| Ridge, Tenn., January 1967. 
5. NUCLEAR Sarery INFORMATION CENTER STAFF, Indexed 


16. 


17. 


18. 


19. 


21. 


26. 


Bibliography of Current Nuclear Safety Ixterature-1, 
USAEC Rep. ORNI-NSIC-8, Oak Ridge National 
Laboratory, Oak Ridge, Tenn., April 1965. 

NUCLEAR SAFETY INFORMATION CENTER STAFF, Indezed 
Bibliography of Current Nuclear Safety Literature-£, 
USAEC Rep. ORNI-NSIC-9, Oak Ridge National 
Laboratory, Oak Ridge, Tenn., August 1965. 

NUCLEAR SAFETY INFORMATION CENTER STAFF, Indexed 
Bibliography of Current Nuclear Safety Literature-3, 
USAEC Rep. ORNL-NSIC-12, Oak Ridge National 
Laboratory, Oak Ridge, Tenn., November 1965. 

NUCLEAR SAFETY INFORMATION CENTER STAFF, Indexed 
Bibliography of Current Nuclear Safety Iaterature-4, 
USAEC Rep. ORNL-NSIC-14, Oak Ridge National 
Laboratory, Oak Ridge, Tenn. March 1968. 

NUCLEAR SAFETY Inrormation CENTER Starr, Indexed 
Bibliography of Current Nuclear Safety Literature-6, 
USAEC Rep. ORNI-NSIC-16, Oak Ridge National 
Laboratory, Oak Ridge, Tenn., June 1966. 


. NUCLEAR SAFETY INFORMATION CENTER STAFF, Indexed 


Bibliography of Current Nuclear Safety Interature-6, 
USAEC Rep. ORNL-NSIC-19, Oak Ridge National 
Laboratory, Oak Ridge, Tenn., September 1966. 
NUCLEAR Sarery INFORMATION Canter STAFF, Indexed 
Bibliography of Current Nuclear Safety Literature-7, 
USAEC Rep. ORNL-NSIC-20, Oak Ridge National 
Laboratory, Oak Ridge, Tenn., November 1966. 


. NUCLEAR SAFETY INFORMATION CENTER STAFF, Indexed 


Bibliography of Current Nuclear Safety Literature-8, 
USAEC Rep. ORNL-NSIC-32, Oak Ridge National 
Laboratory, Oak Ridge, Tenn., March 1967. 


. NUCLEAR SAFETY INFORMATION CENTER Starr, Indexed 


Bibliography of Current Nuclear Safety Iaterature-9, 
USAEC Rep. ORNL-NSIC-34, Oak Ridge National 
Laboratory, Oak Ridge, Tenn., 1967. 


. NUCLEAR SAPETY INFORMATION CENTER STAFF, Indexed 


Bibliography of Current Nuclear Safety Literature- 
10, USAEC Rep. ORNL-NSIC-36, Oak Ridge Na- 
tional Laboratory, Oak Ridge, Tenn. August 1967. 


. Suaprrer, L. B. and R. S. Burns, Indexed Bibliography 


on Transportation and Handling of Radtoactive 
Materials, USAEC Rep., ORNL-NSIC-33, Oak Ridge 
National Laboratory, Oak Ridge, Tenn., June 1967. 

THe Presipent’s SCIENCE ÁDVISORY COMMITTEE, Sctence, 
Government, and Information, US. Government 
Printing Office, Washington, D.C., 1963. 


American Documentation — October 1967 24] 


Opinion Paper 


The Interpretation of SDI Data 


Although a large number of Selective Dissemination 
of Information (SDI) Systems have been planned, 
implemented, and tested over the past few years, 
insufficient attention has been given to the collection 
and interpretation of important data needed for 
evaluation. We describe some of the defects common 


® Introduction 


During the past 5 years numerous reports of Selective 
Dissemination (SDI) Systems, under development, in 
operation proposed for implementation, being tested, ete., 
have appeared in the literature. (1-56) It seems to me 
that almost all of these reports have some serious de- 
ficiencies. Most seriously, they provide little or no ground 
for comparison with other systems.? 

As a participant in the development of the first SDI 
Systems (from 1959-1963), I became sensitive to prob- 
lems of SDI evaluation and have continued to be con- 
cerned with the proper application of evaluative mea- 
sures. I would like, therefore, to discuss the. general 
problem of SDI System evaluation, using as a vehicle 
for the discussion an analysis of the recent paper by 
C. R. Sage on the evaluation of the Ames SDI System. 
(56) 1 chose the Sage report for particular attention first, 
because, it is the most recent, and second, because it 
provides in considerable detail, the illusion of having 
performed significant analyses and evaluation of the SDI 
data. 


* General SDI Features 


Before examining the Ames System in detail, some 
basic clarifications of the important features of all SDI 


1 Tha extengiveness of the SDI literature was not apperent to me when 
work began on this paper. I started the literature as the 
work proceeded and the list of References 41-56 is the result. Although 
the list is not exhaustive, it should be helpful since the only survey 
of BDI literature is the now outdated work of Hensley (Reference 214). 

2Two exceptions to this indictment are the systems described in 
References Y and 44. 
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to almost all of the reported systems, single out one 
recent report for detailed discussion and argue in. 
favor of collection and correct interpretation of data 
on one important and frequently overlooked evalva- 
tion factor. 


T. R. SAVAGE 


Control Data Corporation 


Systems need to be made. The basic universe of dis- 
course (or population) that is pertinent to SDI Systems 
is, as I have urged elsewhere, (67) not simply users (U) 
or documents (D), but their product, (UD). Another 


. important, but frequently overlooked item of information 


ig the percentage of UD that results in notices to 
the users. This might be called the “selective reaction” 
of the system. Note that UD represents all possible 
notices that might be sent, that is, everybody gets every- 
thing; and the purpose of SDI is to reduce this total. 

Although it has become commonplace, since the results 
of the Cleverdon work have become widely known, (&8- 
60) to talk in terms of relevance and recall as evaluation 
measures, these can be seen to be not really appropriate 
when one realizes: first, that the object of our study 
is UD rather than some subsets of it; and second, that 
evaluation measures should be viewed as statistical esti- 
mates of system performance. To revert to an early 
terminology used elsewhere, (10) we can label the four 
boxes of the familiar 2 x 2 matrix (shown in Table 1) 
as: l-Hits 2-Trash; 3-Miss; 4-Pass. Trash and miss 
can be easily recognized as analogs of our old statistical 
friends Type 1 and Type 2 error. A reasonably satis- 
factory first step toward providing some evaluation 
measure is simply to compute the reciprocal of the sum 
of miss and trash. This provides higher scores for 
systems with less error and vice versa. This measure 
is comparative (like a scratch test for hardness) and 
not metric. The metricizing of evaluation measures is 
a separate and complicated problem that will not be 
treated here. 

What is yet unknown about any information system 
(SDI or otherwise) is, in any precise form, the relative 


| Taste 1. Two X two matrix showing subsets of UD 


| Accepted Not accepted 
| by user by user 
I. 2. 
8 tem selected Hits Trash 
Al 3. | 4, o 
System not selected Miss Pass 
| I 


values that users (either individually or collectively) 
assign to these four factors. Users. that are starved. for 
jrmation wil tend to value hits and not be disturbed 
by large amounts of miss or trash. Users flooded with 
information wil value pass and strongly disvalue trash, 
ete. 


° The Ames System 


' In the concluding paragraph of Sage's paper, he states: 
The relative percentage of interest computed as illus- 
trated in Section IV of this article may tend to appear 

' low compared with other SDI Systems. However, our 
only criteria for measuring any degree of worthwhile 
service is through the reactions and comments of our 
users. 

Apparently he regards the second sentence as somehow 
an explanation for the first. Of course, it is not. To draw 
| analogy (as Taube did) (61) between our field. and 
the practice of medicine, Sage’s statement could be re- 


i as: 
| It's true that my pee die as a result of my treat- 
ment. However, the only criterion for measuring any 





reactions of patients. 


| It is unfortunate that Sage has taken this rather disin- 
genuous attitude toward his system, since (1) the paper 
itself provides much of evidence to account for the Ames 
System's deficiencies and (2) despite the detailed analyses 
of the notifications supplied by the Ámes System, there 
pparently is no method used to determine rniss. 

, In the Ames evaluation test there were two separate 
— populations under study: those from Nuclear 
cience Abstracts (NSA) and those from the Science 
Citation Index (SCI). Both were run against 21 User 
rofles. For NSA, UD was 532,988. The number of 
notices generated was 11,226, for a selective reaction of 
02. For SCI, US was 1,456,308. The number of notices 
generated was 6,038, for a selective reaction of .004. This 
difference in selective reaction by a factor of 5 was oc- 
— by a difference in threshold value of only a 


? 


E 


| 
L. 8 Frequently the argument is used that we cannot adequately deter- 


imine mías without an exhaustive examination of the entire document 
‘collection. This fs simply a mistaken notion of scientific procedure. 
Random sampling as used in Reference 9 is a perfectly acceptable and 
straight-forward method for determining misa. 

* These numbers as well as those shown in Table 2, were computed 
from the average data supplied by Sage. z . 








degree of worthwhile (medical) service is through the . 


| factor of about 1.7. When one looks for the source of this 


discrepancy, it is apparently found in the average depth 
of indexing of the two document populations. NSA had 
a depth of 43.1 terms and SCI a depth of 112 terms. 
Although I have been unable to determine from Sage’s 
paper how the matching function and threshold calcula- 
tion actually work, it would appear that they simply 
produce more notices as an almost direct function of the 
number of words in the document profile. Noticing this 
problem at all, however, requires attention to the selec- 
tive reaction of the system. 

In his treatment of “impartial” responses, Sage ex- 


‘cludes these in his evaluation calculations. This is not 


quite fair. Strictly these should be counted as trash. 
Recomputing the results on this basis gives the results 
shown in Table 2. 

The hit ratios, or percentage of sent notices, the users 
judged of interest, are 3095 and 31%, respectively. These 
are very low indeed. 

Since the major differences between the Ames SDI 
System and others with which I am familiar are the use 
of the significance values, the threshold calculations, and 
the feedback adjustments, these features may be the 
source of some of the difficulty. The data for the whole 
of 1965 (Sage’s Section IV) show 81 users matched 
against 177, 180 documents for a UD of 14,351,580; 
54,018 notices were generated for a selective reaction of 
004. Sufficient data are not provided to calculate hit 
ratios, but one could expect the results to be similar to 
those shown above. 

For comparison purposes, let us look at the data for 
three other SDI Systems that have been developed and 
tested." These are shown in Table 3. The Ames System 
fares rather badly in all categories. 

Strietly, of course, the numbers in both Table 2 and 
Table 3 should be shown as statistical estimates with vari- 
ances, levels of confidence, and preferably confidence 
limits indicated for each value. On intuitive grounds, we 
usually assume that miss and pass are somehow more 
"estimates" than hits and trash, because the former are 
measured indirectly, while the latter are directly com- 
puted. This view is mistaken for at least three reasons. 


5In Table 3, "SDI-1" is the system described by Reference 9, 
“KRAFT” ig the system described in Reference 24, and “SDI-2” is 
the «system described in Reference 3. The data is this report were 
supplemented with unpublished data available to the author. 


Taste 2. Ames SDI system data 


NSA I SCI 
Number UD Number % UD 
Hits 3,387 006 1,896 001 
Trash 7,839 014 4,142 003 
Miss * * * * 
. Pass 521,706 . 98 1,450,270 906 


* In the Ames System miss ig unknown and has been lumped with pass. 
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TABLE 3. 


SDI-i KRAFT SDI-2 
Num- Num- | Num- 
br % UD ber % UD ber 90, UD 
Hits 415 .06 17.350 009 14629 .023 
Hit Ratio 41* £8* 68* 
Trash 507 .09 8,950 005 6641 Ol 
Miss 385 .08 t t 227,808 236 
Pass 5,273 49 1,833,700 .986 370,972 607 
SR 13 014 036 


* These are not %’s of UD, of course, but % of the sum of Hits and 
Trash. 


t Again, Mies was unknown and lumped with Pass, 


In the first place, the basis on which we accept a value 


as correct is not how it is measured, but rather, the confi- 


dence, statistically estimated, with which we can assume 
the value is the true one. Secondly, since hits and trash 
are defined in terms of individual users responses, any 


aggregrate numbers for whole systems are subject to the 


same statistical treatment which we apply to any other 
sampling of a population. (Remember that the selective 
reaction is in fact a method, and hopefully a biased one, 
of sampling UD.) Finally, the main purpose of obtaining 
evaluation data at all is to use them as predictors of 
system operation. All predictors are, of course, estimates 
of future performance. 


® The Problems of Miss 


Aside from the fact that the Ames System operates at a 
rather low level of efficiency,® its failure to provide for 
an estimate of miss Jeaves us in a rather poor position 
for recommending. any effective means of correction. The 
simplest method of estimating miss is to send some notices 
randomly to the users, determine the hit ratio of the 
random notices, use this percentage as an estimate of the 
good user-document combinations, multiply all of UD by 
this percentage to get an estimate of the actual number 
of good user document combinations, and subtract from 
this number the good notices actually sent (hits). The 
remainder is then the estimate of miss.. 

. The reasons for providing estimates of miss are not 

widely appreciated and perhaps should be enumerated 
here. In the first place, the common use of hit ratios is 
deceptive if used alone as a measure of performance. In 
most SDI Systems it is a relatively simple matter to 
obtain a high hit ratio by requiring a higher proportion 
of match between the user and document profiles. This 
reduces the number of notices and sends only those 


“Curiously, the system operating coste of Ames fare bediy when 
compared with SDI-2. The SDI-2 processing rate was, approximately, in 
minutes .0018 X UD, while the Ames processing rate la, approximately, 
in minutes, .0070 X UD. Ames runs on an IBM 7074/1401. SDI-2 
was run with a Fortran program on an IBM 704. 
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with a high probability of being judged relevant. But, 
also, (because of the reduced notices) this increases miss. 
Secondly, without an estimate of miss; we have no 
method of determining the a priori match between our 
user and document populations. If, for example, we 
matched MEDLARS documents to AEC Scientists, we 
may get a very low hit ratio, but miss may also be very 
low, indicating that the users and documents just don’t 
match each other. In such a case, we are better advised 
to select another document (or user) population before 
trying to modify our system. In the SDI-2 case, noted 
above, miss was very high, indicating a high a priori 
match between user and document populations, so that 
the high hit ratio is shown, in part at least, to be an 
artifact of the user’s a priori interest in the documents. 
One system reports a hit ratio of .91.7 This is very high - 
indeed. Without, however, an estimate of miss, we don’t 
know whether to applaud the SDI System or the acquisi- 
tions department. 

There is some subtle, and, as yet, unknown relation 
among selective reaction, the a priori match between 
users and documents, the ability of users to absorb notices 
(or documents), and the number of documents the system 
chooses to process. In SDI System operation, there are 
two factors of importance: (1) the product of the read- 
ing (or scanning) speeds of the users and the time the 
users have available for SDI and (2) the amount of time 
the user is willing to wait without receiving some (un- 
known amount of) relevant notices before abandoning 
SDI altogether. These numbers are not easy to obtain, 
but serious attempts should be made to estimate them, 
lest the SDI System overburden the user with paper or 
starve him of notices. Estimates of miss ean help in 
this task. 

For example, if the system selective reaction is high 
and the a priori match is high, the system must restrict 
the number of documents it processes in order not to 
flood the users with notifications. On the other hand, 
if both the selective reaction and the a priori match are 
low the system must process many, many documents in 
order to generate enough notices to give any service at 
all. For simple economic reasons high selective reaction 
and high & priori match are to be preferred. The first 


of these is a system processing problem, and the second 


an acquisitions problem. Without knowing miss, we don’t 
know which. 

The third reason for emphasizing the collection of data 
on miss is that we force on the system operators the 
necessity for accounting for system error, and thereby 
have grounds for recommending improvement. 

Finally, obtaining estimates of miss and trash gives us, 
for the first time, data for actual comparisons among 
alternative systems. 


T This is the Douglas System as reported in Reference 89, 
5 This apparently is exactly the difficulty encountered at Ames. 


Conclusion 


1 


[Spr remains the least expensive, most effecüve and 
iost easily evaluated system to use as a base of informa- 
tion services. Unless, however, the system is evaluated 
correctly and the results of the evaluation are judiciously 
used to modify and improve performance, the advantages 
of SDI quickly disappear. SDI Systems, although simple 
in concept, are, like any dynamic system interacting with 
humans, complex in actual operations. 
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x Brief Communications 


uggestions on How to Read Experi- 
mental Material in Information 


| Science 


i 

t1. Understanding experimental work in information 
science takes active participation on the part of the reader. 
Read the purpose, proposed methods, and results or con- 
clusions in every report first. Then read the whole 
actively, not passively, and with a critical eye. The more 
material read, the easier it becomes to understand because 
much experimental work falls into general patterns, and 
is based on a few fundamental ideas which are not too 
ifficult to recognize. 
, Watch for sweeping statements. These are usually in the 
perdu tory sections, the conclusions, or in projections into 
the future or into other fields. Adopt a “show me” at- 
titude regarding every statement of fact. 

! 2. Do not be scared of big words, mathematical formulas, 
thbles of figures, or fancy diagrams. Look up undefined 
words in an unabridged dictionary. If they are not there 
and they are not explained in the text and you cannot 
thake sense of the report without knowing what they 
thean, do not waste your time reading it further. The 
uthor is not trying to communicate, but has some other 
motive in publishing. 

| Skip the mathematical formulas and look at the concepts 
or conclusions which they — Since mathematics 
is a shorthand language for what is explained in the text, 
very rarely are formulas needed to understand the work. 

| Add up the amounts in tables or figures and, if you can, 
recalculate them. These often come to something like 15496 
or to 31 when the author says he used 25 examples. The 
odd figures mean he has done something to or with them. 
This may be legitimate—if so, it should be explained in 
the text; if not, see if you can tell what he did. Sometimes 
— odd figures mean that the experiment has several 
àriables operating at the same time. 
|'Are the diagrams well described and well labeled? Is it 
— what they are supposed to be showing? Can you 
raw other conclusions from them besides the ones drawn 
by the author? Can you see how he drew his conclusions 
rom them? 

3. In work in indexing or classification, is the experi- 

enter operating under illusions regarding bibliographic 
systems? Does he understand what present systems are like 
hnd how they work? Is there any indication that he has 
ever seen & classification schedule? Does he know about 
the syndetic part of subject heading? Does he realize the 
reasons for making authority files or otherwise having exact 
records? Does his use of terminology show acquaintance 
with present and past literature? Can you recognize old 
ideas in new words? Watch particularly for repetition .of 

ganathan’s ideas. 

| Does the author realize that the user’s question may not 
pe the question he really wants answered? Does he show 
any familiarity with the basic reference tools in the area 
covered? Does he think there is only one type of user 
¡(the kind who thinks like him)? Does he appear to believe 
that only one field or area or subject is significant, timely, 


lor promotes You will think of similar questions as you 


i 


read. 
4. ln any — read the parts describing input es- 
pecially carefully. Look for such statements as, “Terms 


| 
| 
| 
! 
| 
| 
| 


1 
i 


were selected . . .” Ask, “By whom? from what source?" 
This usually means human selection by intuitive means. 
The human part can be extremely well hidden and you 
do not find it unless you deliberately look for it. 

Look to see what the experimenter did with his input 
material. Has he altered it, selected certain parts from it 
by exercising human choice and judgment? What would 
have happened if he had left it alone? Watch for evidence 
of human effort in anything called “automatic.” Many 
"automatic" systems are automatically static unless changes 
or new initial input receive human attention. 

5. If the experimenter used a computer, it often means 
he counted something? What did he count? He may also 
have compared items and performed a further operation 
on the basis of the result. Or he may have repeated an 
operation or series of operations on a definite number of 
items. What labor did he save by using the computer? 
What time? When you take away the computer, what does 
his system do? What does it do that & manual system 
— do? What does it not do that a manual system 

oes 

6. If the experimenter used a statistical method, did he 
use a sample large enough to be valid? Were his selection 
or sampling methods adequate? If you cannot find out any 
other way, ask a statistician, actuary, econometrist, psy- 
chometrist, or sociometrist. In information science, samples 
tend to be too limited and too small to be more than a 
rough indication of what might be the case. Think of the 
size of the field under consideration before you accept any 
figures. A sampling of 500 Sanskrit scholars could be 
definitive while a sample of 500 chemists could be. quite 
inadequate. - 

7. How clearly does the experimenter describe his 
method ? Scientiño work is supposed to produce publicly 
verifiable conclusions. Could you repeat his experiment 
yourself? Does he want you to take his-word or the word 
of an unidentified “specialist” for a statement of fact? 
Does he appeal to authority not available to you for 
verification purposes instead of producing evidence? This 
is often the case where a value judgment is made. 

Does he make comparisons without showing you both 
sides? For example, the experimenter may say his system 
works better than “conventional” classification, without 
making any attempt to show how it is better than, say, 
Dewey, Universal Decimal, Bliss, Library of Congress, or 
Colon. This is especially likely if he “proved” all existing 
classification systems inadequate in a sweeping statement at 
the beginning of the report. Make the comparison yourself 
by lining up old and new side by side ind i demonstrating 
their treatment of the subject. Surprisingly few experi- 
mental classifications can survive this treatment. 

Does the experimenter tell you that he has tested some 
new work of another experimenter and it is not satisfactory, 
without showing you what he thinks it should have done in 
order to be satisfactory? Is he using some seale of values 
which ¡you are not permitted to see? 

8. Examine the output. Look carefully at the material 
from which the experimenter drew his conclusions. Would 
you draw the same conclusions from this data? Has he 
n you enough data? Does the experimenter go beyond 

Is data in drawing conclusions? Does his experimental 
method have application beyond a narrow field? Try the 
method on another field. Will it produce results in any 
subject except the one on which he tried it? 

9. A large proportion of experiments are poorly con- 
ceived, poorly designed, poorly executed and poorly inter- 
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preted. Can you see anything that the experimenter ma 
have missed? Can you improve on his experiment? Wa 
particularly for evidence of lack of knowledge and experi- 
ence in the field the experimenter is covering as this may 
lead to discovery of operating variables of which he was 
unaware. 

10. Finally, many experimenters in information science 


were originally “hard” scjentists and are not accustomed to : 


pane tid and readable prose. Reduce their sentences to 
basic lish wherever possible, Change the puse voice 
to the active and put in a subject (usually 1). Omit the 
superfluous verbiage and translate big words to little ones. 
Most reports turn out to be reasonably simple after such 
treatment. 


Partis A. RICHMOND 
Library 
University of Rochester 


The Training Implications of 
Automated Personnel Systems 


INTRODUCTION 


Whenever an automated system of any type is installed 
there will always be training required: training to acquaint 
employees with the new methods at installation time, and 
continuing training of the people in the computer depart- 
ment to keep them abreast of the rapid technological 
changes in automation methods. In computerized personnel 
systems, in particular, we can identify four major areas of 
training responsibility. We must train: 


1. People whose records are being processed; 
2. People who prepare input data; 

3.. People who run computer systems; 

4, People who use computer outputs. 


TRAINING PEOPLE W noss RECORDS ARE PROCESSED 


An automated personnel system from purely a systems 
designer's point of view is quite like an automatic produc- 
tion status system or & computerized inventory control 
procedure. But in automated personnel proce each 
record in the system does not represent a box of hubcaps, 
it representa a flesh and blood human being. Each punched 
card can represent a man who may have serious doubte 
about any machine’s ability to properly handle his personnel 
affairs, or on the other hand, he may have a genuine fear 
of his statistica being processed by an all powerful “elec- 
tronic brain.” 

This fear of the ability of the computer is most drama- 
tically expressed by the enormous increase in voluntary tax 
return received by the Internal Revenue Service 
since they began to computerize their record keeping. 

As a result of over 3,000 personal interviews with a general 
cross section of the American public, Dr. Robert 8. Lee of 
Columbia, University recently wrote that the uneasiness 
about the computer as an “all knowing super device” causes 
even more concern than the fear of displacement through 
automation. 

With this as a background, what then is our training 
responsibility toward the men whose data are being pro- 
cessed by the machines? We must assure each man, that 
under the automated system two things will happen: 


1. He will receive objective and impartial treatment; 
2. His treatment will not be impersonal. 


We must assure him, for example, that if we plan to 
search the personnel files for all those men with over 5 

ears of accounting experience and a reading knowledge of 
french. that if he has these qualities, he will be selected. 
Or, that we could impartially locate all men whose absences 
from the job exceed 15 days per year. 

We must assure the man through proper training that 
even if his processing is done by machine that the contact 
with him wil be by another person who can explain what 
the computer has produced, how it affects the man, and can 
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make allowances for extenuating circumstances’ not pro- 
grammed into the system. 

A frend of mine was recently selected by the Army’s 
automated personnel system for overseas assignment. There 
was no question that this man was due for his overseas 
rotation at this time, so the computer did its job properly. . 
However, the people in the personnel assignment section: 
delayed his departure for 1 month while his wife recuperated' 
from a serious operation. This is what is meant by objective’ 
yet not impersonal action. | f | 

The complexity of the system and the number of people 
controlled will bear directly on the training method, but 
usually carefully prepared lectures on introductory com- 
puter processing, and a clearly written, well-illustrated 
pamphlet on the system will do the training job. If done 
properly, they will orient the staff toward a realistic ap- 
preciation of what the computer will be doing and will: 
stimulate their cooperation and active support while you. 
are installing the system. l 


TBAINING PeorLm Wao Prepare INPUT DATA 


The person we are talking about here is the man at the: 
interface between the source of the raw data and the 
system itself. Specifically, we mean the individual who. 
prepares input forms contaming the data that enters the 
system. This individual may be the same man whose rec- 
ords are processed but usually he is some type of coder, 
such as a bookkeeper in an accounting office or a clerk in a 
military pay system. 

The reason for the training is that the coder must have 
a, crystal clear understanding of what data to enter into 
the system, how it is to be represented, and precisely where 
on the forms it is to be placed. If we can not establish | 
this data entry discipline through proper training then 
the entire system will fail. Without valid input we have 
nothing. 

There are really only two requirements to insure the 
effectiveness of the input training effort: (1) Keep the 
coding instructions simple, and (2) Continually retest and 
retrain the coders. ^ . 

The instructions to the coders must be in plain English. 
We recently assisted a major accounting firm in installing 
a large cost control system, where the instructions to the 
coder required her to “place the net receipts in domain A, 
and the gross receipts in domain B” on the coding sheet. 
All we had to do to reduce the input errors by 4% was to 
change the word “domain” to the word “position” in the 
procedures. 

In many cases an imaginative systems designer and 
trainer can convey the coding requirements simply and 
with very little text by using the decision table technique. 
A properly designed decision table can easily hold, on one 
sheet, the information from five or six sheets of written 
text to quite easily direct the coder to the proper entry. 

If many coders are to be trained this is a made-to-order 
application for the programmed instruction training tech- 
nique. The method is successfully used by the Internal 
Revenue Service in training their staff in visual editing 
of some tax returns, and by a number of computer manu- 
facturers in the training of the coding aspects of computer 
programming. Programmed instruction texts can be ex- 
pensive (about 50 hours of writing to each hour of net 
instruction) but can be a very efficient way of training 
large groups of coders. 

Because the input coding function is so critical a one- 
time training effort will usually not suffice. Therefore, a 
routine must be established in the automated system design 
itself to determine the error frequency by type for each 
coder so that management can decide when it 1s economic 
to recall coders to reinforce their understanding on a data 
entry problem. 


TRAINING Prora WHo Run Computer SYSTEM 


Probably the most obvious training implication of an 
automated personnel system is the training of the people 
who will run the system. Here we are concerned with key 
punchers, tab operators, computer operators, programmers, 
and systems analysts. 


| 


One of the most vital issues here is the question of who. 


wil be trained. When any automated system is installed 
there is usually some displacement of clerical personnel. 
The extent to which these people can be retrained for 
responsibilities in the computer department is limited, in 
my opinion, mainly by the imagination of management, not 
by the ability of tis worker. When the new computer 
installation is staffed with retrained employees, we have 
found it to have a far greater chance of success than one 
staffed by new hires. Aside from the obvious morale 
advantages, the reason for this is really quite simple. It is 
usually | easier to train someone in computer techniques 
lan it is to teach a man your business. 
Computer training is becoming a highly developed art, 
and if properly tramed, a man can be writing productive 
programs in just a few months. We, for example, over 
the, past 314 years have trained and "placed hundreds of 
— programmers who have completed our 25 ses- 
2 night per week evening course. I dare say that it 
would take many many times longer than that to train a 
man in the intracacies of civil service personnel actions. 
It is interesting to note that organizations who have for 
various reasons been forced to follow this retraining policy 
hive found it to be a blessing in disguise. Many banks, 
with outstanding ADP results, are forced to train from 
within rather than hire fully experienced programmers who 
might demand a salary in excess of that of one of the vice 
presidents. Many progressive railroads h by union 
seniority restrictions have found that the manual-opera- 
jons-experienced man is exceptionally valuable in the 
computer section. 


| 
TRANING PEOPLE W go Use COMPUTER OUTPUTS 


| Probably the most universally neglected training re- 
sponsibility is that of training the user of the computer 
outputs, namely, the manager who is expected to act based 
upon t & computer generated report he receives. 
j It is unfortunate that in our current state of highlv 
developed third generation software and hardware systems 
at the training of the executive who prescribes and uses 
ese systems is usually only a stepchild of other training 
rograms. Management training is frequently just a gen- 
eralization of a computer concepts course for beginning 
operators or a watered down version of a computer pro- 
ramming course. 
| We strongly feel that the content of the training for an 
executive must be something unique to his needs. We 
recently completed a series of 62 depth interviews into the 
automation training needs of executives from research or- 
oe bend insurance companies, retailers, construction 
rms, banks, and government agencies. of the execu- 
tives were gurprisingly consistent in their responses and 
éxpressed a need to receive training in two major areas: 


1. An ability to recognize the management considerations 

' 1n computer processing. 

2. An ability to react to & management consideration with 
the appropriate management technique. 


raining in the following five areas: 


1. How to identify potential systems applications 

2. What & manager must know about programming 

| 3. How to prepare a feasibility study 

4. How to control the accuracy of data in a computer 
stem 

5. How to organize for computer processing. 


Ihe list of five areas we mentioned is valid for the 
tate of the computer art as it stands fodas. It is valid 
or the needs of personnel record keeping. The future, how- 
ever, will be gure to expand on this list of training topics 
when applications such as automated personnel E ut 
systems using computer generated simulations gre used, or 
when the growing populates of the linear programming 

chnique is used in wage and salary evaluation. Both of 
these techniques have been already successfully applied. in 

ese areas, 


| More specifically the executives wanted to receive detailed 


SUMMABY 


In summary then we see that to insure an efficient auto- 
mated personnel system we must train four major groups 
of individuals: 


1. People whose records are processed 

2. Coders who prepare input 

3. Those who operate the computer system 

4. Managers who use the outputs to make decisions 


The effective training of people to work efficiently in & 
computer environment, in our opinion, represents one of 
the major challenges to modern personnel administration. 


MICHAEL J. RAUSEO 
Management Research Associates 
Arlington, Virginia 


Documentation in Thailand 


Thailand, whieh some may know better by its older 
name of Siam, is still largely an agricultural country, al- 
though a considerable amount of industrial development 
has started in recent years. The value of scientific research 
as a means of accelerating the country’s development is 
now fully reco d by the government. The first im- 
portant step n by the government towards improving 
scientific e was the establishment of the National 
Research Council of Thailand as a central body to advise 
the government on scientific policy. 

The National Research Council, in its turn, recommended 
that the Government should take two further important 
steps, The first of these two steps was the establishment 
of a Thai National Documentation Center, to operate a 
full range of documentation services, including a national 
science library. The second major recommendation was 
for establishment of an Applied Scientific Research Corpora- 
tion of Thailand, which would function as a semigovern- 
mental body for “the purpose of setting up and operating 
national research institutes in the main fields of the applied 
sciences, 

As a matter of fact, one will agree that the National Re- 
search Council asked for these two projects in the correct 
order. It was essential that the initiation of the Documenta- 
tion Center should procede the expansion of the research 
activities. Such research work as Was already in progress 
in the country was being severely handicapped by the lack 
of modern organised facilities for obtainin information. 
The p of new research institutes and of their pro- 


. grams, even before they went into operation, would require 


the services of an efficient, documentation center. Only on 
this condition could the research program be established on 
a rational basis. 

The National Research Council therefore — action to 
set up & national documentation center. In 1961, on a 
request from the Government of Thailand, UNESCO 
started providing specialized aid for this project under the 
U.N. Technical ce pro . The aid from UNESCO 
is mainly in the form of services of three expert 
advisers for several years, to — in planning and imple- 
menting the project, and approximately $25,000 in equip- 
ment &nd several fellowships for training the staff of the 
center. 

The Government through the National Research Council, 
provided funds to the amount of three quarters of a million 
dollars for the building and — of & modern docu- 
mentation center. This center Thai National Docu- 
mentation Center, went into operation in May 1964, and 
is at present in a stage of rapid expansion of its services. 

The TNDC, to give the center its short title, is designed 
to carry out a number of functions. It is a national docu- 
mentation center, providing services of document procure- 
ment, bibliography compilation, and translation for science 
and industry throughout Thailand. It includes a national ` 


science library, with approximately 2,000 square meters of 
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stack rooms and re room. The Center also provides 
the special library services for the research institutes of 
the Applied Scientific Research Corporation, which are 
being established on the same site as the TNDC. 

Now to say a few words about the general position regard- 
ing scientific library and information services in d, 
against which the ue of this new development, the es- 
tablishment of the TNDC, can be assessed. 

Thailand has at present about 60 scientific libraries, all 
but one or two of them being located in the capital, 
Bangkok. They range in size and effectiveneas from two or 
three fairly large libraries with a well-qualified and experi- 
enced librarian to small, poorly stocked collections with 
untrained or part-time libranans. Interlibrary collaboration 
is better than might be expected, thanks to the existence 
of an active Thai Library Association, but is still hampered 
by various factors, including the restrictions that are com- 
monly due to government control. The stock and services 
of these libraries are usually available to a limited range of 
users, One or two of the larger libraries provide documenta- 
tion services, such as procurement of microfilm or photo- 
graphic copies of papers, to their users; the majority of 
them, however, do not attempt to go beyond the collection 
and use of their own stock. 

The combined resources of these libraries, even if they 
were all available for general use, would still be very small. 
On a reasonable estimate, perhaps one in ten of the pub- 
lished documents that a scientist in Thailand might need 
to consult would: be available in the country. The remain- 
ing nine tenths are at present obtainable only by procure- 
ment from abroad. In such circumstances, it is obviously 
not possible to carry out serious scientific work. 

In order to improve this situation, the immediate pro- 

of the TNDC is to fil] the huge gaps by pov n 
ow-cost, fast service of scientific document procurement by 
modern photographic methods, available to all who can 
use it. Bangkok libraries are made fully available through 
this service, since the TNDC has compiled a union card 
catalogue of periodicals held in these libraries and can 
supply microfilms or photocopies of any article from them 
on request. For the remaining 90% of papers, the TNDC 
has established contact with some 30 documentation centers 
throughout the world from which microfilm copies can be 
obtained by the TNDC on request. 

This service, together with other supporting services such 
as bibliography compilation and translation, at least enables 
the scientist to get on with his work. It is recognized by the 
TNDO, however, that the full long-term solution requires 
much more than 

The TNDC is therefore working in various ways to 
-reduce the dependence of the Thai scientist on time- 
consuming and expensive procurement of individual papers 
from abroad. The Centers own library is being stocked as 
rapidly as possible, particularly with important publications 
that are not otherwise available in Thailand. Encourage- 
ment, help, and advice to other libraries is an avowed part 
of the policy of the T'NDC, since it is realized that the 
specialized scientist needs the direct servicea of his own 
special library in addition to those of the national center. 
In its training programs for its own staff, the TNDC is 
recognizing that the keen, carefully selected young people 
who are now learning the intricate operations of scientific 
documentation in the Center will one day almost certainly 


be drawn away to organize and operate specialised centers. 


in the more important branches of science and technology. 

The steps described above will, it is hoped, keep the 
development of scientific documentation in iland in its 
proper position, that is, &head of the needs of the research 
workers. The execution of such a program is, of course, 
not without its problems, but the excellent support of 
the government is making it possible to solve these as they 
arise, The more urgent needs, both in the TNDC and in 
the scientific libraries, are for training facilities, particularly 
study abroad, and for aid in the massive problem of building 
up the stocks of libraries to a reasonable level. 

We in Thailand are fully aware that collaboration is a 
two-way operation. We are grateful for the opportunity 
afforded to us by other countries under which we can 
draw on their knowledge, their libraries and their services. 
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For. our pan we are making every effort to collect and 
process the scientific literature of Thailand, and we will 
always be E to deal with any requests or inquiries in 
relation to that literature. 

A few words must also be mentioned about the recent 
progr oes of the Thai National Documentation Center. 

ractical the TNDC has been established to provide a 
variety of documentation services both to the staff of the 
Applied Scientific Research Corporation of Thailand and to 
research workers and technical personnel in other institu- 
tions in Thailand. 

The center was officially inaugurated December 2, 1964, 
and the growth of the demands on the "responsive" 
services can be seen by comparing the demands on these 
services during the first and second halves of 1965. The 
numbers of requests received were: 


Jan.-June 1965 July-Dec. 1965 


Document procurement 186 452 
Bibliography compilation . 20 26 
Translation 25 30 


The library continues to grow rapidly. The book intake 
in 1965-2,500 volumes—was equal to the total intake of the 
previous 3 years of preliminary collection. Research reports, 
bulletins, etc., received during the year numbered 7,100. 
The number of scientific periodicals being received by sub- 
scriptions, exchange or gift was increased during the year 
from about 700 to over 1,000 titles. and good progress 
was made in building up library's holdings of back volumes. 

The demands on the microfilming and photocopying 
services reflected the growing demands on the documenta- 
tion services. There was also an increasing demand on the 
the facilities for making copies of scientific documents pro- 
vided by the user organization. Comparative figures for 
the two halves of the year 1965, with the annual totals, were: 


Jan.June July—Dec. Total for 1965 
Microfilm (frames) 512 18,620 19,138 
_ (3,828 strips) 
Photoprints from 
microfilm 3,897 4,562 8,450 
Direct photocopies 800 1,922 2,722 
Jobs requested 117 359 476 


The services of the printing facilities were also in heavy 
demand. The offset printing presses carried out jobs total- 
Jing 1,065,410 impressions (single-side printing), while the 
duplicating machines produced 232,200 impressions. 

Last but not least, “List of Scientific Reports Relating to 
Thailand List No. 2" was published in December 1965 with 
the inclusion of 2,115 items in various field of science and 
technology. 


CHUN PRABHAVI-VADHANA 
That National Documentation Centre 
Bangkhen Bangkok, Thasland 


Scientific and Technical Information 
in Japan* 


Mr. Chairman, Ladies and Gentlemen: 


It is my pleasure to introduce to you today the general 
situation in Japanese scientific and technical information. 
There are many subdivisions within the field of scientific 
and technical information, so I feel it would be better if 
I focus on major organizations and their backgrounds and 
activities in this broad field in Japan. 

I have chosen to talk about the Japanese Ministry of 
Education, the National Diet Library, the Japan Science 
oun) sn and the Japan Information Center of Science and 

echnolo 

The Ministry of Education was established as an integral 


* Speech given at Science and Technology Information luncheon 
group on Thursday, May 4, 1967. 


z 
i 


| 
part of the Government in 1871. It exercises & profound 
— upon the majority of ed — producing 
and using scientific information ugh those of its func- 
tiohs which affect all aspects of education, its control of 
personnel and policies and appropriations in almost all of 
the national universities. It is further influential through 
its subsidizing programs for governmental and private re- 
search institutes, academic societies, and through the pub- 
lieationa these organizations issue. 

e Bureau of Higher Education and Science is- one 
ofiseven major divisions within the Ministry of Educa- 
tion and this Bureau consists of nine subdivisions. One of 
these subdivisions 1s the Scientific Information Section 
créated in August of 1952. 
` The advisability of establishing a large national infor- 
mation center for science and technology was recognized 
early—about 1950. In May of 1951 the Japan Science 
Council, to which I shall refer later, after discussing the 
matter in a special committee, advised the government to 
ne a large science information center with the further 
recommendation that this center be developed within the 
Ministry of Education. The Scientific Information Section 
was a preliminary to the larger step. However, the neces- 
sary budget for a large center was not forthcoming go 
the| Scientific Sections activities continued on a limited 

le, with some growth achieved. 

The «Scientific Information Section launched its work 
by preparing a union catalog of foreign-language scientific 
ae and periodicals in major Japanese collections. Be- 
fore that. time any one university — for instance, 
wóuld have scant information about the collections in other 
universities. 

fFhe Information Section is also charged with broad- 
casting Japan's scientific achievements abroad, so to speak, 
d with promoting the international exchange of science 

ormation. 

¡The Ministry of Education’s most important present 
contribution to the development of science information 
tivities is its financial and technical assistance to the pub- 
lication of the Japan Science Review. The Review is pub- 
lished in three separate sections:: (1) Mechanical and Elec- 
trical Engineering, (2) Medical Sciences, and (3) Biological 
Sciences. Hach section contains bibliographies and abstracts 
in its respective field. 

| The total budget of the Ministry of Education for 1966 
amounted to some 700 million dollars, of which 10 million 


ollars was granted in aid for basic research. Of the latter . 


amount, $200,000 was devoted to the publication of re- 
search results, where emphasis was laid on secondary publi- 
chtions with an allotment of $50,000. This was distributed 

ainly among academic societies to assist them in the 
publication of periodicals and scientific books. 

| The second one is National Diet Library. By tradition, 
libraries in Japan have been regarded as private proper- 
es of individuals, organizations, and institutions. The 

ational Diet Library wag created in 1948 by Parliamentary 
m to serve as a national center of library activities. 

Plans for the National Diet Library organization were 
eveloped with the advice of leading officers of the United 
States Library of Congress. 

| The National Diet Library consists of the Central 
Library and 33 branch libraries. Branch libraries, which 
were formerly independent of one another, may be classi- 
fied into three types. One is the Ueno Branch Library, 
ormerly the Imperial Library. The second type comprises 
he famous private collections such as Seikado and Toyo. 
e third type consists of the libraries of government or- 

ganizations servicing administrative and technical person- 
hel in the government. 

Formerly, the administrative headquarters and the major 
collection of books, journals, and were housed in 
the Akasaka Detached Palace, with considerable difficulties, 
but many of these difficulties have now been overcome by 
the construction of a new National Diet Library building 
in a 1961, at the administrative center of the national 
apital. 

The National Diet Library's total volumes numbered 
bove 5,220,000 in 1960. 

The Library's budget for book purchasing was about 


I 


$600,000 in fiseal 1966 of which $500,000 was allocated to 
the field of science and technology. 

The Library offers reference services to Diet members. 
The Library also serves as a depository for copies of all 
materials published in Japan. The. Library coordinates 
library activities among government agencies. The Library 
furnishes general library and bibliographic service to the 
public, and it promotes international exchange of infor- 
mation, including publishing in foreign languages and 
in in international publication projects. The 

ibrary also prepares and publishes Japanese Pertodicals 
Indez of serials published in Japan. 

This series contains two editions, a quarterly of social 
and humanistic sciences, and a monthly of natural sciences. 
As for the latter, its English edition has been issued since 
August 1960, with support from the United States National 
Science Foundation, and 800 copies are being sent regu- 
larly to the United States, and 400 to other foreign 
countries. 

Now the Japan Science Council will be explained. Soon 
after the Second World War, leading scholars in Japan 
formed a Commission of 108 men under the Chairmanship 
of Dr. Kankuro Kaneshige, to review overall problems of 
postwar rehabilitation of intellectual life and organization 
of research. How could Japan be brought up-to-date in 
science and technology after the long years of war time 
isolation? 

The Commission recommendations led to the formation 
of the Japan Science Council as a state organ, with the 
aims of promoting the development of science and per-. 
meating it into the administration and industry, as well as 
into the life of the nation. 

The Council consists of a Secretariat and seven major 
Divisions: the Humanities; Law and Politics; Economics; 
Natural Sciences; Engineering; Agriculture; and Medicine 
and Pharmacology. 1e members of the Council are 
elected by their professional colleagues for terms of 4 
years. The voting privilege is limited to those who are 
professionally active and have graduated from a college 
at least 5 years prior to the elections: Candidates for the 
Council seats are normally men and women of. national 
distinction in their own intellectual fields. f 

The Council provides a forum for the exchange of ideas 
at the highest level, and serves as a liaison agency among 
many diverse institutions and — It lacks authority 
to compel acceptance of its advice by the government, 
but its decisions and recommendations do carry considerable 
weight. Consequently, the primary function of the Council 
is that of a policy-making organ. Its secondary function 
is to stimulate cooperation among all Japanese institutions 
concerned with science and technology. Its ternary func- 
tion is to represent the country in international scientific 
activities. 

The scientific information activities of the Council con- 
sist of those of the National Committee for Documentation 
and the Library of the Council. 

The National Committee for Documentation, which is 
a subordinate organ of the Council and the national repre- 
sentative of Japan to the International Federation for 
Documentation and other international organizations in 
matters of scientific information. 

The Library of the Council, which is attached to the 
Secretariat, serves the Council. The Library exchanges the 
So of the Council with over 70 countries of the 
world. 

To stimulate public interest in the nationwide problem 
of such activities, the Couneil carries on a modest but 
important publication program based on its library re- 
sources, which consist of 50,000 books and 60,000 journals, 
bulletins and so on. It issues discussion papers and it 
sponsors publie lectures and symposia such as one held in 
April 1959, at which important national representatives dis- 
cused control and servicing of scientific information before 
an audience representing many professions and all regions of 
Japan. 

The last is Japan Information Center of Science and 
Technology. When the Prime Minister’s Science and 
Technics Agency was established in May 1956, it success- 
fully lobbied in the Diet for support of an organization 
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which it proposed to sponsor; and the JICST came into 
being in. August 1957, as a corporate body contributing 
to the development of science and technology from foreign 
and domestic sources. i 

The organization of the center consists of & President, 
Vice President, Specialist Committees, a Planning Office, 
a General Affairs Division and an Information Division, 
with a Library and a total staff of about 150. 

It has a capital fund of about one million dollars and 
—— for fiseal 1966 was also about one million 

ollars. 


Operations are conducted in a new four-story building 
equipped with modern facilities and located advantage- 
ously near the principal government offices and the new 
-National Diet Library building. 

As a domestic service to Japanese industry, the Infor- 
mation Center provides bibliographies, references, and 
similar documents in all fields except biological, medical, 
and agricultural sciences. It translates foreign papers into 
Japanese upon request, and provides photoreproduction 
services. It offers the same services to foreign clients, in- 
cluding translation of Japanese papers into English. 

The Center also provides abstracting and indexing ser- 
vices. In 1960 about 1,000 Japanese and 2,500 foreign 
journals, patents, and other publications were abstracted 
or indexed. 

For storage and retrieval of the documents, it generally 
uses the Universal Decimal Classification System. A spe- 
clally designed electric computer was installed in May 
1961. It uses magnetic tapes for storage and retrieval of 
scientific information. 

The publications of the Center are: Foreign Patent 
News; Current Bibliography on Science and Technology; 
and the Center Monthly. 

Foreign Patent News is a weekly, published in Japanese 
and English regarding chemistry. 

Current Bibliography on Science and Technology is issued 
in a fortnightly series of six parts. They are general and 
mechanical engineering; electrical engineering; chemistry 
and chemical industry; geology, mining and metallurgy; 


Letters to 


Dear Sir: 


Considerable research has been undertaken in recent years 
to learn how scientists and technologists keep themselves 
informed of current work in their fields of interest. The 
reports of this research emphasize that the most important 
ways of keeping informed about current work include at- 
tendance at international, national, and local professional 
meetings, the distribution of formal reports and reprints 
. among colleagues, & considerable &mount of personal cor- 
respondence, and a significant amount of oral communica- 
tion at meetings. The study of journal articles, the use 
of abstracting and indexing services, and the use of library 
and documentation services appear to be of secondary im- 
portance in meeting the needs of scientists and technologists. 
An important reason for assigning these methods secondary 
importance is the time lag between the current state of 
research and development activities and the subsequent 
publication of information about them. This period is 
variously estimated to be from 1 to 3 years, even though 
most professional societies continue to exert themselves in 
attempts to shorten this time lag. 

Some .ideas for making it possible for scientists and 
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civil engineering and architecture; and theoretical and ap- 
plied physics. Í 

Publications from far countries are collected via air cargo 
in order to process them in the shortest possible time. 

The Center monthly is issued in Japanese to provide 
information concerning the Center's activities and articles 
on documentation study. 

The Center, in the course of 10 years of operation, has 
established & nationwide reputation. ! 

The Center has also been an associate member of the 
International Federation for Documentation since 1957. 

All four organizations about which I have told you are 
governmental or semigovernmental organizations. other. 
words, even nowadays scientific and technical information 
business has not yet appeared in Japan. 

Finally, I would like to talk about the major inadequate 
points in this field. They lie principally in the areas of 
insufficient coordination among relating agencies and in 
language problems. I 

One agency still knows little about what others have 
done or are doing and planning. Of course personnel con- 
cerned have been trying to lish. elose relations with 
each other, but we feel there are some gaps as well as 
overlaps in their activities. 

I believe most scientists and engineers in Japan can 
read academic papers in English, but T papers in: 
English or some other foreign language is a different prob- 
lem. Scientists and engineers prefer to write in Japanese, 
to save time, Translation of these papers into various, 
foreign languages adds extra work to the information 
service agencies. I 

I sm optimistic about the future, however. With the 
new electric computers we can do anything, eventually. 
They are a great help even now. And furthermore, the 
invention of a translating machine, or the new use of 
computers for that purpose, should lighten the very hea 
burden of translating slowly and writing it out by hand. 

I thank you very much. 


Toury Kixvcnur 

First Secretary 

Embassy of Japan 
Washington, D.C. 


the Editor | 


technologists to find information about the current status 
of research and development are offered below.t 

In the present age of computerized publication of throw- 
away annual directories, it should be quite feasible for a 
professional society to pres more up-to-date information 
about the activities of its members by — to the di- 
reotory & key-word index to the research and development 
activities of its members. Annual directories of professional 
societies would then consist of a minimum of three parts: 
an alphabetical directory of members, a geographical index, 
and a key-word index to research and development activi- 
ties. 

The key-word index will require a way of gathering 
information about research and development activities. All 
professional societies .present an annual statement of dues, 
and while they collect the necessary money from those who 


1 The stimulus for this communication was found in the description 
of the American Psychological Association’s Project on Sclentiflo Infor- 
mation Exchange in Psychology, in the article by Belver O, Griffith 
and William D. Garvey, “Systems in scientific exchange and the effect 


` of innovation and change." Procesdings American Documentation Insti- 


tute, 1:191-200, 1964. 


Msn to continue membership as well as members’ addresses 
and official titles, customarily they do not gather much 
more information. A questionnaire’ sent with the dues 
statement could provide the information for a research 
and development index to the directory. Questions such as 
these could be asked: 

1. Are you currently engaged in research? Describe the 
title of each research project In no more than 10 words and 
indicate the starting date and the projected completion date. 

2. Are you engaged in a development activity? Describe 
each nd in not more than 10 words, and include 
— ing date and e completion date. 

ave you published reports, articles, books, etc., since 
ux filling out this questionnaire? Please list your formal 
publieations with full M dare citations — the 
following entries as your models. (models omitted here) 
New members are requested to send a complete list of their 
publications. 

4. Have you prepared manuscripts, informal reports, or 
any other writings which you would be willing to share with 
your colleagues? Please list them 

5, Are you willing to send copies of your publications to 
ss society’s headquarters so that they may be copied and 
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distributed to all persons who request them? If your answer 
is “yes,” please star the entries above to indicate the items 
you are sending with this report. 

If a professional society does not wish to store and copy 
publications for sale to interested persons, it could arrange 
with a library or a commercial service to undertake this 
activity. For example, the journal articles of Chemical 
Abstracts are kept and sold for the Chemical Abstracts 
Service by the John Crerar Library in Chicago, and Uni- 
versity Microfilms in Ann Arbor stores and photocopies 
the doctoral theses listed in Dissertation Abstracts. 

The professional societies which maintain abstracting and 
indexing services in their particular fields might be willing 
to match the bibliographical citations obtained from the 
questionnaires with the citations in their abstracting and 
indexing services. The entries which are not found in the 
abstracting and indexing services could be added to the 
key-word index to research and development activities in 
the current directories. 

C. D. Guin 

Professor of Library Science 
Indiana University 
Bloomington, Indiana 
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Book Reviews 


4/67-1R Theory of Self-Reproducing Automata. 1966. 
John von Neumann. Arthur W. Burks, Editor. University 
of Illinois Press. 


John von Neumann made a number of important con- 
tributions to the development of modern computers and 
automata. The present volume has been edi from two 
of his unfinished manuscripts and so falls naturally into 
two parts. 

In Part I, “Theory and Organization of Complicated 
Automata,” von Neumann presents his views on extremely 
complicated automata. He begins by discussing computing 
SE in general and what makes them complex or 
simple. 

Von Neumann went on to discuss rigorous theories of 
control and information including the statistical informa- 
tion theory of Shannon. He related these results to the 
theory of computability as formulated by Turing. 

It is the last two of the five lectures which are the most 
interesting. Here, von Neumann discusses the role of 
complexity in automata. Many similarities between com- 
plicated automata and the nervous system are discussed. 
It is interesting to note that he was intrigued by the 
numbers of elements involved. There are 1010 neurons 
in the human brain while computers had 2X10% tubes 
when von Neumann gave these lectures. Current computers 
(IBM 360-91) have roughly 5x105 transistors so that 
natural systems are still much larger than artificial ones. 
When this material was written, a 1,000 word memory was 
standard; now 65,000 word memories are common. On 
the other hand, the memory capacity of a human being 
is not yet known. Computer components are much faster 
than neurons. 

After considering these questions, von Neumann moved 
on to discuss the synthesis of complicated automata by 
other automata. A number of schemes for self-reproduction 
are mentioned. The possibility of evolution and random 
mutation is also considered. . 

In summary, the first part of the book is a semitechnical 
discussion of the nature of highly complicated automata 
with consideration of the parallels and differences between 
abstract and human automata. These lectures are particu- 
larly pleasant to read. A serious consideration of these 
problems requires a knowledge of many areas such as 
logic, —— theory, information theory, computers, 
etc. Since von Neumann was competent in all these fields 
and utilizes them in his lectures, this volume illustrates 
a first-class mind at work. It is even more impressive when 
one realizes that we lve in a time of pathetic over- 
. Specialization. 

In Part II, entitled "The Theory of Automata: Con- 
struction, Reproduction, Homogeneity,” von Neumann set 
out to construct a self-reproducing automaton. He began by 
asking five basic questions: 


1. When is a class of automata logically universal? 

2. Can an automaton be constructed, 1e. assembled 

` and built from appropriately defined “raw materials,” 
by another automaton? Also what class of automata 
can be constructed by one, suitably given, automaton? 

3. Can any one, suitably ee automaton be “con- 
struction-universal,” 1.e., able to construct every other 
automaton? 

4. Can any automaton construct other automata that 
are exactly like it? 

5. Can the construction of automata by automata pro- 
gress from simpler types to increasingly complicated 
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types? Can this evolution go from less efficient to 
more efficient automata? 


Question (1) has a well-known answer due to Turing. 
The other questions are all answered affirmatively by von 
Neumann. This is accomplished by the detailed con- 
struction of a self-producing automaton. He starts by 
postulating a regular two dimensional grid, each cell of 
which is a 20-state automaton. Using these components, 
an automaton which is really a (universal) Turing machine 
is constructed. The constructing machine constructs es- 
sentially a tape for a Turing machine and the finite state 
control unit. The complete machine acts as a (universal) 
Turing machine. Thus, Question 2 is answered. Ques- 
tion 3 is reduced to Question 2 by giving a plan for 
converting the constructing automaton into a universal 
constructor. 

Von Neumann reduced Question 4 to Question 3 by show- 
ing how to make the universal constructor reproduce itself. 
The trick is to have a complete description of the con- 
structor on the same tape. Intuitively, this seems impos- 
sible since the constructing automaton must contain a 
complete plan of the constructed automaton and also 
must be abs to understand and execute this plan. This 
bottleneck can be gotten around bv keeping two copies 
of the information on the tape. One copy is used in 
the construction: the other copy is passed to the second 
machine. 

Question 5 brings in other issues Buch as the nature of 
eficiency. Consequently, this problem is not discussed 
in the same detail as the others. 

In the second part of the manuscript, von Neumann 
succeeded in answering all of his questions in the affirma- 
s His detailed construction is easy to follow &nd quite 
clever. 

The editor, Professor Arthur Burks, has contributed much 
to the present volume. His many comments are bracketed. 
in the text; this preserves the original flavor and adds 
explanatory material. The casual reader will find much 
of this information helpful and most readers will find the 
chronological material interesting. Advanced readers max 
find the tutorial comments tiresome and repetitious, but 
these are easily skipped. 

Professor Burks is to be congratulated for doing such & 
thorough job in finishing these manuscripts. All those 
people, myself included, who saw the original manuscript 
can appreciate the work which the editor has done. 


MICHAEL A. HARRISON ` 
University of California 
Berkeley 


4/67-2R Symbolic Shorthand System. 1966. (Rutgers 
Series on Systems for the Intellectual Organization of 
Information, Volume VI.) Hans Selye. New Brunswick. 


89 pp. 


In a brief 10,000 words Dr. Selye and his former librarian, 
George Ember, here attempt to describe the salient features 
of the classification system which Dr. Selye has employed 
in organizing his collection of documents (some 700,000 
items, chiefly journal article offprints and photocopies) 
in the field of endocrinology. This is not a shelf classifica- 
tion, but & classification meant for the organization of a 
classified catalog. There are some 1,800 “class numbers,” 
distributed abong 20 main classes; the distinctive feature 
of the system is that the class numbers are made up of 


mnemonic symbols, which may be combined in various 
ways! according to fixed rules of precedence and order. 
‘Thus, the class number for the thyroid gland is 


Tr 
and the class number for lymphomatous thyroditis is 
| Tr-itis-Ly 
and the class number for antithyroid drugs is 
! Trj 
and if we wished to classify an unlikely article which dealt 


with the effect of antithyroid drugs on lymphomatous 
thyroiditis, we would designate the subject as 


| Tr-itis-Ly«*—Tr| 


If this same article also dealt with the action of radioiodine 
in hypothyroidism, this topic could be designated as 


| Tri €—1* 


These not unusual examples illustrate several things: 
the use of truncated syllables suggesting the tissue or the 
process or the substance designated; the use of symbols 
such as overlining, underlining, arrows in various direc- 
tions, asterisks—in all there are 31 signs which are non- 
alphabetic and nonnumeric, Capitalization or noncapitaliza- 
tion is significant; while “Im” represents infundibulum, 
“IM” represents immunity or hypersensitivity. Further, 
as the example shows, citations are posted in as many 
places in the scheme as the number of topics may demand; 
they are filed in special “divisions” (e.g., the amino acids 
tyrosine [Tys] and diiodotyrosine [Ditys] both file in 
the division of Amino Acids [Amac]); and the element 
to'the right (the agent) determines the order of filing, 
rather than the element to the left (the target). 

This complex but ingenious system hag been used by 
Dr. Selye with great success. Selye is a brilliant man, 
and has made brilliant contributions.to endocrinology and 
medicine. His published works are supported by lists 
of references which are astounding in their extent and 
completeness. 

‘But when all is said and done, the Symbolic Shorthand 
System remains the Handapparat of an individual, whose 
interests are highly individual. To Selye, endocrinology 
is not a subject area in the ordinary sense; it is a point 
of view; it 18 the place on which he stands and from 
which he views the rest of the world. 

“It should be emphasized,” says Selye, “that the sys- 
tem is not necessarily limited to a specific disciplinary 
field .. . [itis] a system which is applicable to all branches 
of the life sciences.” If by this Selye means that the 
general notions of the system are adaptable to a field 
such as rheumatology, or such as psychiatry, or such as 


gynecology, to serve the needs of a particular rheuma- 


tologist. or psychiatrist. or gynecologist, then we can agree. 
But it is inconceivable that the system could be made to 
embrace at once the entire field of biomedicine, serving 
the needs of a large number of persons. 
! Selye wistfully asks "how far could the Symbolic Short- 
hand System be developed through mechanization”? He 
says that systems analysts have felt that the system could 
be mechanized, but he concludes that “the system works 
so well as it is that the inducement to venture into com- 
— is not sufficiently attractive.” His intuition 
ere 18 quite sound. f 

The remaining half of this small pamphlet consists of 
a “seminar pane) discussion,” of which 50% is contributed 
by F. W. Lancaster, formerly associated with the Cran- 
field Project and now with the National Library of 
‘Medicine, Mr. Lancaster, an able man, contributes an 
interesting discussion of the factors of precision and recall. 
When he concludes that “theoretically, at least, the sys- 
tem has the capability of & performance range from high 
‘recall to high precision" we may be permitted to heave a 
sigh; but when he says: "In the example, the Order of 
Predecence says that the effect of adrenaline should pre- 
cede the effect of cortisone. Suppose, however, we were 
in an organization in which we were particularly interested 
in cortisone, or suppose that the — of the document 


expressed cortisone. Then there is no reason why we 
could not reverse the order and produce our own — 
system to suit our own needs"—then we may be permitte 
to oppose thumb to forefinger. 

Those seriously interested in exploring the structure of 
this system should consult Symbolic Shorthand System 
(SSS) for Physiology and Medicine, by Hans Selye and 
— Ember, 4th ed., Montreal, 1964 (xxxvi, 238 p.), 

90. l 


Frank B. Rogers 
University of Colorado Medical Center 


4/67-3R, A Checklist for the Organization, Operation 
and Evaluation of a Company Library. 2d rev. ed. 1966. 
Eva Lou Fisher. Special Libraries Association, New York. 


61 pp. 


The nonlbrarian management or library administrator 
will find, in Miss Fisher’s checklist, nearly all of the 
questions that should be asked to analyze the library 
requirements and services to be provided in a company 
environment. Following each major question are a list 
of references published, with few exceptions, between 1960 
and 1986 which have pertinence to the question. 

Part I raises 12 general problems of management. 
The reader may follow the cross-references included in 
the text should he desire more detail or allied information 
on any topic. 

Part II covers 26 specific problems of library operations 
from Acquisition to Statistics and for the person who 
wants to start a small, new library there is Part III 
“Where to Start.” 

The concept and general approach is excellent. The 
booklet is an invitation to consider the major problems 
of management and operations and, if the invitation is 
accepted, directs the reader to the literature. The check- 
list 18, indeed, a valid contribution to the literature of 
librarianship. This fact does not mean the compilation is 
the complete and total answer on how to organize operate, 
and evaluate the company library. 

It may be that the basic organization of the text should 
have been revised and updated as were the references. 
An increasing attention is being given to the use of com- 
puters in libraries, This is reflected in a number of the 
papers cited but is handled only in passing in the text. 

The correlation between the text and the references 
cited could be strengthened. In the section on the form 
of library organization, the articles on "Planning the New 
Library" are cited. In most cases these refer to the physi- 
cal layout rather than the organizational structure. This 
reviewer would prefer having them appear in Section X 
of Part II where the topic is “Space.” The practice of 
cross referencing does not lead, in this case, the reader 
of Section X to the article cited under General Problem 3. 

The extensive list of citations in some sections and 
the paucity of them in other sections suggests there are 
areas in library literature which are not covered as well 
as they should be. Library journal editors might review 
the references in the checklist and other general texts and 
develop guide lines for suggesting possible subject areas 
to their writers. 

The failure to use boldface type for the margin head- 
ings in Part I and the lack of consistency in underlining 
make the use of this section a little more difficult than 
for Part II and III. 


G. E. RANDALL 
Research Library, IBM 
' Thomas J. Watson Research Center . 


4/67-4R Anglo-American Cataloging Rules. 1967. Pre- 
pared by the American Library Association, The Library 
of Congress, The Library Association and the Canadian 
Library Association. North American text. American 
Library Association, Chicago. 


A dictionary defines Rules as principles regulating the 


procedures or methods necessary to be observed in the 
pursuit or study of some art or science. Cataloging rules, 
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therefore, could be defined as principles regulating cata- 
loging procedures or methods. Principles, on the other 
hand, are defined as fundamental assumptions forming 
the basis of a chain of reasoning. To understand catalog- 
ing rules we must, therefore, understand the principles 
upon which the rules are built. Catalogers, not unlike 
other librarians, seem to be extraordinarily uninterested 
in principles upon which their professional work is based. 
It 18 safe to presume that most of our activities are based 
upon countless questionnaires, ad hoc assumptions, and 
almost always on the present-day conditions or individual 
experiences in the individual library. The best illustration 
of my point can be found on page vi of the new Anglo- 
Amencan Cataloging Rules: 


It is regrettable that, because of the great size of many 
American card catalogs, it was necessary ...to agree 
. . - that certain incompatible American practices be con- 
tinued in the present rules. 


The above statement is particularly difficult to under- 
stand when one considers the illustrious roster of names that 
decorates the present rules: Seymour Lubetzky, the origi- 
nator and prime promoter of the new rules and also the 
most outspoken advocate of “principles” was the Anglo- 
American Cataloging Rules’ first editor (1956-62) ; C. Sum- 
ner Spalding was the Rules’ second editor (1962-66); Wyllis 
E. Wright was the Chairman of the Catalog Code Revision 
Committee; P. 8. Dunkin was a member of the Steering 
Committee; Ruth C. Eisenhart was a consultant (1961- 
64); and Richard Angell was a member of the General 
Committee. 

All six of them were members of the seven-man official 
American Delegation to the International Federation of 
Library Associations’ International Conference on Cata- 
loging Principles in Paris in 1961. During the Conference, 
the American delegation agreed to all but two voti 
(conference’s principles 103 and 12). In spite of this, 
the present rules departs from the International Principles 
(Principles: 9.12, 11.14, 9.4, 9.6; new code pp. 3-4). 

This departure would have been justified if the demands 
of computerized processes were making it obligatory but 
“the problems of machine arrangement of entries in auto- 
ee | systems were not ignored but no action could be 
taken” (p. vi). The new Rules therefore are neither purely 
“international” or “modern,” nor are they strictly based 
upon principles. They are at their best well-edited, topo- 
graphically improved, and pleasantly compiled traditional 
cataloging rules. 

Upon close examination, we find that the stated prin- 
ciples are so often contradicted that instead of rules 
based upon principles we have again a cataloging code 
based on practices. The entry of the work, for example, 
is based upon the statements that appear: 


1. On the title page 

2. On any part of the work 

3. In the first work in the collection 
4, In the first edition of the work 

5. In related work 


When there is suspicion or evidence that the statements 
are erroneous or fictitious the entry is based on reference 
sources or on the consensus of scholarly opinion. 

I case of joint authorship the work can be entered 
under: 


.l. The first mentioned on the title page — 
‘2 The author given topographic or wording prominence 


.. 0T 
3. the author whose heading is first in alphabetical 
order 


In further analysis we discover that not only the respon- 
sibility of the author or representation on the title page 
have to be taken into account but also: 


1. Formal history (as in case of corporate authorship) 

2. Official approval of the institution 

3. SOIN of the library for which the cataloging is 
one 

4. English language 

5. 


Library of Congress practices 
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Not only the principles contradict each other but also 
the examples add to the confusion. The rule “enter under 
title a work that is of unknown or uncertain authorship” is 
illustrated by: 


La capuciniére; ou, Le bijou enlevé a la course. Poéme 
(possibly by Pierre Francois Tissot; erroneously attrib- 
uted to Pierre Jean Baptiste Nougaret) | 


i 
Two examples later, the following occurs: 


A true character of Mr. Pope 
(author uncertain, generally attributed to Jean Dennis) 
Main entry under Dennis. 


Also entered under title are: 


E of agriculture. National Agricultural Li- , 

rary. 

and Who’s Who in designing ... Members of, Interna- 
tional Association of Clothing Designers. 

but Biographical directory of the American Political Sci- 
ence Association ... Edited by Franklin L. Burdette. 
Washington, D.C. American Political Science Associa- 
tion. 

and Buyers’ guide, British Jeweler’s Association. 

are entered under the associations. 


Form of name also. breaks the stated rules. The following 
are selected examples: 
Philip II, King of Spain not Felipe II, King of Spain 
but Jan III Sobieski, King of Poland not Johr: IT So- 
bieski, King of Poland 


Catherine IT, Empress of Russia, not Ekaterina II. Em- 
press of Russia and Nasser, Gamal Abdel not Abd al- 
Nasir, Jamal 


but Evtushenko, Evgenii Aleksandrovich not Yevtus- 
henko, Yevgey and Staravinskil, Igor Fedorovich not 
Stravinsky, Igor 


Disraeli, Benjamin, Earl of Beaconsfield not Beaconsfield, 
Benjamin Disraeli, Earl of 

but Newcastle, Margaret Cavendish, Duchess of not 
Cavendish, Margaret, Duchess of Newcastle 

Palestrina, Giovanni Pierluigi da ; 

bué Giovanni da Ravenna - | 


— bodies have their equal amount of inconsistencies. 
us: 


Unesco not United Nations Educational, Scientific and 
Cultural Organization and Euratom not European 
Atomic Energy Community 

but North Atlantic Treaty Organization not NATO 


Lo Jm University, Chicago and Newman Club, Brooklyn 
ege 


Names of libraries follow the same pattern: 


Bibliothéque nationale (France) 

Rio de Janeiro, Biblioteca Nacional : 
National Agricultural Library 
Kongelige Bibliotek 


^ British Museum 


US. Library of Congress 
etc. 


There are also some beautifully reassuring statements: 
“17 B. Works not of corporate authorship. If the work 
would not be entered under corporate body under the 
provisions of A above or if there is doubt as to whether 
it would, enter it under the heading under which " would 
be entered if no corporate body were involveu: Make 
an added entry under the body unless if functions ... y 
as publisher.” 

We should also mention new spelling rules: Muhammad, 
the prophet (rule 27 A) instead of Muhammed, the Prophet. 

In spite of the above, there are basic improvements of 
the Anglo-American Cataloging Rules over the ALA 


Cataloging Rules for Author and Title Entries (1949). 
First and above everything else, the new Rules’ inclusion 
of the rules for descriptive cataloging is a welcome innova- 
tion. The descriptive rules follow the long accepted 
Library of Congress rules. They are, however, greatly 
expanded and brought up to date. Chinese, Japanese, 
Korean, Hebrew, Russian, and Yiddish examples are 
welcomed. 

The 153 pages devoted to the problem of descriptive 
cataloging are probably the best part of the total rules. 

The glossary, capitalization, abbreviations, .numerials, 
unctuation, and diacritics are equally useful. The Index, 
owever, js very uneven: definition of author has two 
references to almost identical statements while Festschriften 
have only one reference (to contents) omitting references 
to entry (33H) and the like. 

In conclusion, we can say that our new rules, although 
— the most pressing needs for international stan- 
ardization, full implementation of generally accepted prin- 


ciples, and neglecting computerized processing are just as. 


fine as the former rules were at the time of their publica- 
tion. The cataloger does not have to measure any more 
the distance to the cemetery—he still must however go 
outside the city limits and see if the church is “in the open 
country” (rule 98c). The cataloger now knows how to 
deal with spirits, spiritual media, and spiritual communica- 
tions (rule 13C). 


ÁNDRB NITECKI 

Assistant Professor 
School of Ixbrary Science 
Syracuse University 


4/67-5R Information Retrieval with Special Reference 
to the Biomedical Sciences. 1966. Wesley Simonton and 
Charlene Mason, Editors. University of Minnesota, Min- 
neapolis. 199 pp. 


This softbound brochure contains the 14 papers and 
corresponding discussions from the Second University of 
Minnesota Library School Institute on Information Re- 
trieval, Minneapolis in November 1965 (the first, “In- 
formation Retrieval Today” was held in September 1962). 
Iun an introductory paper on “Patterns and Problems" 
Dr. Maurice Visscher of the Physiology Department, Uni- 
versity of Minnesota. Medical School cites at length 
from otneial science information studies and approvingly 
quotes Richard Orr. The state of art of mechanized 
indexing is cogently discussed by Mary Elizabeth Stevens 
of the National Bureau of Standards stressing the “re- 
discovery” aspect of our present efforts: the sequential- 
step camera discussed at a library conference in New York 
City in 1853, or the key word-in-context (Chem. Titles, 
elc), index developed independently by Lubn and Ohl- 
man around 1958 and later traced by the latter at least 
a hundred years back to British librarian Andrea Cresta- 
doro’s word-in-title index for the British Museum in the 
1850’s. Miss Stevens also felicitously discourses on dif- 
ferences between “machine-” and “people-indexing,” in 
the context of researchers by Don Swanson. John O’Con- 
nor, Tukey and others, M. M. Kessler of MIT discusses 
search strategies of his project MAC in some detail; 
Norman Shumway of the National Library of Medicine 
reviews vocabulary construction MEDLARS subject head- 
ings; Louise Darling of the UCLA reports on their 
MEDLARS experiences; Honeywell 800 to 7094, ete., 
describing the procedure in detail. Dr. Joseph Izzo of 
the University of Rochester discusses Index-Medicus cover- 
age and identification of diabetes-related literature; a 
counterpoint on the same theme by Dr. Arnold Lazarow of 
the University of Minnesota follows; what price a tri- 
alogue between Honeywell 800, GE 225 and a Control 
Data `? The "profile" of diabetes literature, drawn in 
wore cnarts and Y tables, by Elmo Brekhus, an associate 
Oi 1. Izzo at the University of Minnesota, appears under 
the unrevealing title of “Newer Methods of Document 
Handling.” Jacqueline Felter of Union Catalog of Medical 
Periodicals, N. Y. Medical Library Center writes enter- 
tainingly and candidly about programming problems and 
solutions in preparing a computerized Union list of hold- 


ings of medical periodicals in Greater New York libraries; 
while Evelyn Moore, University of Washington, St. Louis, 
discusses their computerized serials control and book cir- 
culation. Frederick Kilgour of Yale University Library 
outlines in rather general terms the “Basic Systems As- 
sumptions of the Columbia-Harvard-Yale Medical Li- 
braries Computerization Project,” while Mrs. Henriette 
Avram of Library of Congress discusses the card-catalog 
computerization at LC; Dr. M. M. Cummings of the 
National Library of Medicine presents a general plan for 
development of the medical libraries, and Foster Mohr- 
hardt of the National Agricultural Library concludes with 
a discussion of the “National Information Systems.” 

It is not entirely clear for whom this volume is intended, 
in addition to being presumably distributed to all regis- 
trants at no extra cost. The recent proliferation of pub- 
lished proceedings of conferences and colloquia, sessions 
and symposia, on various aspects of biomedical communi- 
cation and documentation, has already arrogantly taken 
so much valuable shelf space, that it is difficult to enthuse 
over another X" thick side-stitched tome of 200 single- 
spaced elite-typewritten 016"»(11" pages, especially as the 
absence of any index leaves the reader with no other easy 
pathway through the tome than a one-page table-of-contents 
listing of titles and authors. A list of registrants in this 
meeting, and their identification in the question-and-answer 
sections following each paper would have improved this 
book, because these sections form an articulate and lively 
contrast to occasional pomposities in the papers themselves. 
Virtually all of the charts, tables and specimen printouts 
accompanying the presentation by Shumway, Izzo, Lazarow, 
Brekhus, Felter, Moore, and Avram, are well-chosen and 
useful reference material. Conceivably, any critical com- 
ments about the book are attempts to crash an open door; 
perhaps editor Dr. Simonton modestly felt that the publica- 
tion was not deserving of genera! distribution (no price 
is quoted). Yet there is much of permanent reference 
value here for biomedical documentalists; certainly not 
less than in some of the much more expensively printed 
hardbound books put out by well-known publishers on the 
same general subject over the past 10 years. I have little 
doubt but that the volume will find and hold its place 
among its many competitors on the overflowing biomedical 
documentation reference shelf. 


Boris R. ÁNZLOWAR 
Pharmaco-Medtcal Documentalion 
Chatham, New Jersey 07928 


4/67-6R Coordinate Indexing. 1966. John C. Costello, 
Jr, Graduate School of Library Service, Rutgers, The 
State University (Rutgers Series on Systems for the Intel- 
lectual Organization of Information Vol. VID, Edited by 
Dr. Susan Artandi. Supported by the National Science 
Foundation. New Brunswick, New Jersey. 218 pp. 


The objectives of this series are stated in the Preface 
as follows: "The investigation is intended to examine the 
various methods or systems individually, study them in 
depth within the framework of a seminar series, and then 
produce a group of papers which, in addition to being state- 
of-the-art contributions to the scholarship of the field, 
should also serve as a basis for the ultimate objective, sys- 
tems comparisons. Each paper then should be a description, 
a discussion, a critique, a collection of facts and data.” 

Evaluated against the above criteria, this particular vol- 
ume falls well short of its goals. It fares best as a dese,- w- 
tion of coordinate indexing systems and the methods and 
procedures associated with them. In this capacity it 18 ex- 
cellent and probably no better single how-to-do-it book 
exists on the subject. As a discussion in the semimar frame- 
work, however, it is terribly overbalanced toward the source 
paper (185 pages or 96% of the actual text) and away from 
the comments of the attendant panel of experts—Giuliano, 
Warheit. and Bernier (9 pages or 4% of the text). The 
contributions of the panel are well worth reading but are 
so fragmentary that they merely whet the appetite. 

The author does point out most of tbe well-known 
strengths and weaknesses of coordinate indexing systems. 
The way in which he does this, however, hardly qualifies 


en 
"I 
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as a critique, at least in the scholarly sense of the word. 
Appended to the volume is an extensive bibliography of 
139 references, only one of which is, I believe, referred to 
specifically in the text. There is an almost complete ab- 


sence of utilization of the literature or of study results from . 


other workers. There is no hard data on how devices such 
as roles, links, and weights have fared in tests. There is no 
data on how much more time it takes to index using such 
devices. Generalizations which should cite some experi- 
mental or factual support are commonly offered to the 
reader presumably to be taken on faith (Example: p. 186, 
par. 3). Time and again remarks demanding a reference, a 
source, a footnote, or some of the other paraphernalia neé- 
essarily and justifiably associated with state-of-the-art 
studies are ignored, leaving the reader, or this reader at 
least, rather uneasy. ^ 

The intent may have been to respond somewhat to the 


call for a “collection of facts and data" through the inser-. 


- tion of Appendix 2, listed in the Table of Contents as a 
“Summary of Data for Five Operating Coordinate Indexing 
Systems.” This Appendix was, however, missing from the 
review copy and no volumes containing Appendix 2 could 
yet be located at the time of this writing. — 

What the book does it does logically, thoroughly, and 
with an ordered approach that organizes a great deal of 
material for the reader, Alternative working methods and 
approaches are described in exhaustive detail. The delineable 
steps in standard procedures are outlined fully and expertly. 


The author does an excellent job of providing definitions, i 
collecting synonyms, and indicating other terminological | 


roblems or confusions that exist in the field. However, after 
isting a group of synonyms he quite frequently fails to 
select one for his own use and continues to run through 
the entire string every time he uses the concept (Example: 
p. 71, par. 3). | 

This.repetitive, tutorial approach, fine for the classroom 
but inappropriate here, plagues the book throughout and 
makes it at least a good 25 pages longer than it need be. 


- . Some examples of extensive repetitive passages are in order: 


1. Definition of “Related Terms" (p. 94, 170-171). ` 
2. Coordinate indexing can be analytical, or clerical, or 
various combinations of both (p. 20, 28, 31, 45). 


3. Data concerning the document’s physical or biblio- © 


graphic characteristics are of use as well as data con- 
cerning its content (p. 17, 57). "n 

4. P oo make better indexers than specialists (p. 
47, 82). E 

5. t must be familiar with the subject matter (p. 
32, 82 


- Yt is difficult to tell whether the redundancy of the text is 
to be attributed to the author's background in developing 
instructional manuals and syllabi for the Battelle course on 
Coordinate Indexing or to the fact that he was apparently 
required to follow a Rutgers-provided outline not of his own 
construction. 

The author’s personal preferences come across keenly; for 
example, his very strong bias towards subject-qualified . in- 
dexers is apparent throughout the paper. This frequently 
relates to a negative bias against machine indexing, as in the 
following statement: “The accomplishment of coordinate 
indexing by clerical, personnel, or by other personnel not 
professionally qualified to comprehend what the document 


“s -actually discusses, can be considered as ineffectual as ma- 


-chine indéxing” (p. 30). Machine indexing, as described by 
the author, is, however, limited to frequency count indexing. 
The reader will look in vain for any reference-in the text to 

. automatic indexing studies or to the fact that they may 

involve the use of criteri& other than straight frequency 

counts. ' 
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-being sent to the editor. O 
“pears incorrectly as “Guiliano” consistently throughout the 


Though generally good on terminologieal problems, the 
text occasionally makes the mistake of using a specialized 


.term quite a long time before it gets around to Moy Pa 
íi e- 


Some examples are “terminal-digit card," "rdl card, 
call,” and “Relevance.” This could cause many readers 
some difficulty. ` 

The four main sections of the text, and their respective 
page lengths, are as follows: Input (90 p), Store (87 p.), 
Searching (17 p.), and Output (16 p.). As can be seen, these 
sections get progressively smaller; relatively speaking, there 
is also a falling off in thoroughness and quality as one 
proceeds. 

In the section on the Store one finds the following state- 
ment: ."To the extent that microimage chips and film reels 
are suitable devices for storage of coordinate indexes, they 
should be used only for static or historical documents which 
will require no change to the stored images" (p. 124). It 
may be of interest that this statement is probably already 
obsolete in view of the development of certain new equip- 
ment, the use of which has just been reported in American 
Documentation. : . j 5 
' In: that part of the Searching section dealing with. the 
pros and cons of “generic posting" (p. 166-169), the author 
fails to observe that a computer retrieval system can be 
designed so that the searcher has the option of searching 
automatically on the terms “narrower’.or “broader” to the 
search terms he has initially selected. Such a system would 
obviate any need for “generic posting” and would save a 
great deal of storage space. 

The section on Output leaves perhaps the most to be. 
desired. Among other thi the author says, “Links and 
roles, as they have been defined here, have been used in 
unit concept indexing only since 1961. As yet, there has 
been insufficient experience with their use in retrieval to 
permit the preparation of more than a handful of reports 


. on their effectiveness; and those which have been published 


reflect the availability of something less than adequate sub- 


` Btantiating data&".(p. 190). Considering the claims that have 


been made for these devices over nearly fie entire previous 
text of the book, most readers will, I an; sure feel some- 
what dismayed by this late announcemei't. The “handful 
of reports" are not identified. ^n 

The author closes with a series of “Indeterminacy Princi- 
ples,” for which he makes the claim that “the very acknowl- 
edgment of their existence negates the value of statistical 
exercises in the name of research on relevance ratio and 
recall ratio” (p. 193). Reciting over and over again the im- 


-perfections of perception and communication, like a kind of 


tany, he writes himself into such a neat solipsistic corner 
that one is d to find him contributing to the panel 
discussion which follows. 

As a physical object the book is sturdy and well bound. 
Typographically it is mediocre, with section and subsection 
headings ‘not underlined, all-capitalized, or boldfaced, and 
therefore melting hopelessly into the rest of the typed text. 
As mentioned previously, Appendix 2 seems to. have been 
accidently left out, at least 1n the copies examined. There : 
is no index. Typographical errors appear slightly more fre- 
quently than one is willing to overlook. A list of some 20 
such quibbles, noticed dre without special searching, is 

y, Dr. Giuliano’s name ap- 


report of the Panel Discussion, but correctly elsewhere. 


W. T. BrRANDHORST 
Documentation Incorporated . 
Bethesda, Maryland 


1 Koxumplik, w. A. and Lange, R. T. Computer-Produced Microfilm 
Library Catalog. American Documentation, 18: 67-80 (1967). 
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